Slashing Discovery Budgets with Analytics

icontrolesi iControl ESI News & Events

JeffJohnsonIt is no surprise that the single largest cost component of electronic discovery is the expense associated with attorney document review. For years, it has been the preeminent topic of discussion during the planning and execution of any large-scale review. It is time to stop talking about the burgeoning cost of document review and take action to minimize it.

…for the vast majority of large document reviews, opting out of analytics technologies based on cost is the equivalent to taking a knife to a gun fight.

Jeff Johnson, CTO of iControl ESI

Fortunately, we have the technology to tame this budget killer. With the proper use of analytics we can keep review dollars lower and time-lines shorter. A few key analytic tools include Predictive Coding, Conceptual Clustering, and Near Duplicate Detection. The implementation of said tools has proven the ability to reduce billable review hours by more than 40%.

This begs the question, why aren’t these technologies more broadly adopted? First, there are still many who have not had a chance or opportunity to understand the potential value and benefits first hand. Second, questions about the defensibility of utilizing these technologies have persisted and have kept some from reaping the rewards. Finally, the perceived cost of utilizing these tools has been a deterrent.

Analytics 101

The most broadly used Analytics technologies are based on Latent Semantic Indexing (LSI). LSI uses common linear algebra techniques to learn the conceptual correlations between words in a collection of text. The automated application of that learning to a specific population of documents, allows us to develop powerful solutions. Some of these solutions are:

Similar and/or Near Duplicate Detection – Given the text of a specific document, an analytics‐enabled review platform can readily identify similar and/or near duplicate documents from within the population, and present them for review along with the subject document. Similar documents are merely documents that have a conceptual similarity that exceeds a pre‐determined threshold (often 90 or 95%). Near duplicate documents are documents about the same concepts, including nearly the same words and word order. Of course, all of this is in addition to the de‐dupe process that helps eliminate actual duplicate files and email from the review population.

Conceptual Clusters – These are simply groupings of documents that the automated analysis determines are related conceptually to one another. The simplest application of Analytics in document review involves making review assignments based on these groupings. A thousand documents about similar concepts will be reviewed much faster and more consistently, if done by the same reviewer on the same day.

Predictive Coding – This is the latest, and potentially, the most powerful application of analytics in the discovery review process. This application of analytics leverages the benefits of attorney review and the power of technology to facilitate a faster, more effective review of large populations of documents. Essentially, it involves:

  • The selection of a subset from the review population,
  • Attorney review of the documents in the subset, and
  • Application of analytics technologies to apply the attorney designations to similar documents throughout the population

Addressing the Defensibility Issue

In our experience, the primary concern around the utilization of these technologies has been acceptance by the courts. An overall lack of understanding has lead to a lack of confidence. This lack of confidence, combined with the ever‐present possibility for discovery‐related sanctions, results in reluctance to rely on these technologies. Though a little out of context, the following statement is well suited regarding utilization of these technologies:

A little ignorance can go a long way. But what you don’t know will always hurt you.2001 Times 23 Nov. 20

Fortunately the tide is shifting. Early adopters have seen the potential become reality, shaving 40%‐70% off the cost of attorney review, while achieving more consistent results. These results have led many of those who have seen the realities of large document reviews in complex litigation to suggest that, in such cases, the application of analytics‐based technology is the only realistic way for the producing party to fulfill its obligations under the Federal Rules of Civil Procedure.

Dismissing the Cost Concern

Finally, the cost of applying these technologies to a review has prevented many from adopting the technology. While there are certainly many cases (the smaller ones) where the application of analytics may not be helpful, for the vast majority of large document reviews, opting out of analytics technologies based on cost is the equivalent to taking a knife to a gun fight. The knife certainly costs less, but does the likely result justify the decision?

Additionally, the cost of adding analytics to your review continues to decline. Some review platforms now include analytics components at little to no additional charge.

Reaping the Rewards

For those willing to adopt these new technologies as part of review strategy, the benefits are powerful. They include:

  • 40‐70% reduction in review cost
  • Better / more consistent review results
  •  Transparent / replicable review results
  •  Reduced access to confidential documents


The time for change is now. Analytics‐aided review of large document populations is quickly becoming the new standard for document review and production in large complex litigation. Early adopters will reap the rewards. Taking the time to learn about this emerging technology and how it can save our clients’ time and money is no longer an option. It is a necessity.