Follow ACC Docket Online:  

In Situ eDiscovery Tools

This is the second part of a two-part series on eDiscovery. The first article reviews the new In Situ Reference Model, and this one discusses different eDiscovery tools for in-house legal teams.

Decades ago, back when eDiscovery was getting started, many vendors developed specialized tools to help manage what at the time was an avalanche of data. Many of those tools required that data be handed off to different servers often operated by different vendors. Although the handoffs involved security and data integrity risks — and were expensive — the gains in efficiency made the risks and costs worth taking.  

Fast forward to today and nearly all tools for selecting responsive or relevant information have been built into the content management and productivity tool system used by most corporations. Here are the eDiscovery tools that can be applied iteratively and interactively on documents while they are in place within the corporation or its cloud infrastructure — without having to make those additional handoffs or incur additional costs. 

 

DedupeDeduping

This identifies files that are bit-for-bit identical. Deduping means only having to process or review one copy while keeping track of where all the instances of a document were found.

 

Near DupeNear duping

This identifies documents whose content is almost identical even though the files are not exactly alike, such as a PDF version of a Word document. 

 

Text SearchText search

Text search can be used to include or exclude documents based on the words that appear on their face, or in the metadata associated with them.

 

Thematic ClustersThematic clustering

This technology identifies documents that talk about the same types of themes, even if they are not necessarily duplicates or near duplicates. Thematic clustering can be an effective way to include or exclude clusters from further consideration. They can also help identify keywords. 

 

OCROptical character recognition (OCR)

OCR converts document images to searchable text that is needed for not only text search, but also for thematic clustering and predictive coding. Normal OCR processes only create text for documents that are entirely image based, such as scanned or faxed documents. Deep OCR also creates text values for images embedded in partial text documents. 

 

Email ThreadingEmail threading

Emails are best understood in the context of the conversation or thread within which they occurred. Email threading technology presents email conversations in a way that minimizes the number of times the emails from early in the conversation have to be reviewed. In the simplest scenario with no added or dropped parties, only the last email in a thread needs to be reviewed to also read each email in the thread.

 

Domain Name AnalysisDomain analysis

Domain analysis permits lawyers to include or exclude emails from further consideration based on the domain names associated with the sender or recipients (e.g., e-newsletters from people at cnn.com are most likely to be irrelevant and nonresponsive).

 

Social Network AnalysisSocial network analysis

Because of the internet protocols governing emails, they provide a rich set of data about who sent emails to whom, and when. This makes it possible to get an overview of with whom key players corresponded over time. This can be high-level summaries like domain names, or as granular as individual email accounts.

 

Predictive CodingPredictive coding

This technology uses advanced text pattern algorithms to extend review decisions made on a subset of documents to the whole collection, and is intended to minimize the amount of time it takes to review and make decisions about the responsiveness or relevance of all the documents. 

 

ExportExport

Once potentially responsive documents have been identified and properly winnowed, they need to be produced or, in some cases involving large collections, sent to a third-party review platform for final review. The export functionality can provide a variety of options, such as native files, image-only PDFs, or text and metadata load files that can be loaded directly into a database system.


To learn more about the In Situ eDiscovery Reference Model, read the first part of this series, which delves into the day-to-day operational and management problems that it can resolve. For more resources about eDiscovery and cloud software, visit ACC’s Technology, Privacy, and eCommerce Interest Area.

About the Authors

Richard StevensRichard Stevens is executive vice president, chief legal officer, and corporate secretary at Old World Industries, LLC. He has also served on Old World’s board of directors. Prior to Old World, he worked at BP and Amoco Corporation and as a commercial litigator in private practice in Chicago. [email protected]

Anne KershawAnne Kershaw is a lawyer and consultant who has been immersed in eDiscovery and information governance for many years. She co-authored the Judges’ Guide to Cost-Effective E-Discovery, teaches at Columbia University, and has written earlier articles for ACC Docket on discovery topics. [email protected]


Robert RogoffRobert Rogoff is vice president of information services at Old World Industries. His responsibilities include Cyber Security, eDiscovery, and IT infrastructure at OWI. [email protected]



The information in any resource collected in this virtual library should not be construed as legal advice or legal opinion on specific facts and should not be considered representative of the views of its authors, its sponsors, and/or ACC. These resources are not intended as a definitive statement on the subject addressed. Rather, they are intended to serve as a tool providing practical advice and references for the busy in-house practitioner and other readers.