Validating Review with eDiscovery AI
Validation of a review process, whether manual or using some technology is a valuable part of the eDiscovery process to ensure a complete and accurate review.
There
are multiple options for validating a review and most rely on statistical
sampling as the foundational part of the process. Sampling allows users to
generate reliable metrics on the performance of the review while managing the
amount of manual effort required in the validation process
1) Important Terminology
a) Random Sample - the use of a randomly selected subset of the population to represent the whole population. Based on the results of the subset, the result of the overall population can then be calculated and estimated. The size of the sample required to provide that estimate is largely based on three factors[1]: 1) the volume of documents in the overall population, 2) the margin of error and 3) the confidence level.
b) Margin of error – a percentage measurement of how different the results from sample may be from the entire data set. For example, assume a random sample was created with a 2% margin of error and review of that sample found 25% of the documents were relevant. The actual rate of relevance across the entire data set could be 23%-27%.
c) Confidence level is the probability that the sample results can be expected to fall within a specified range. For purposes of e-discovery, this is almost always 95%, meaning that 95% of the time the results of the sample will match the results across the entire population, within the margin of error.
2) Analyzing Samples and Statistics of Note
a) When assessing the results of a document review, regardless of whether it was conducted by search terms, TAR, AI, there is an emphasis on a handful of statistics to help analyze the results. These are based on comparing the results of the review process to document coding by a subject matter expert across a document sample. The assumption in this process is that the subject matter expert coding is always the correct coding.
b) These can be summarized as:
i) True Positives (TP) - eDiscovery AI accurately identified the document as relevant.
ii) True Negatives (TN) - eDiscovery AI accurately identified the document as not relevant.
iii) False Positives (FP) - eDiscovery AI inaccurately identified a not relevant document as relevant.
iv) False Negatives (FN) - eDiscovery AI inaccurately identified a relevant document as not relevant.
c) In analyzing the sample, the volume of documents in one of the above categories can be compared to the volume in other categories to help ascertain how well the method worked at capturing what we wanted it to. For this analysis, there are some statistics that are generally used to assess quality.
d) Richness (TP+FN)/(TP+TN+FP+FN). The percentage of the relevant documents in the sample. It can be used to estimate the overall percentage of the relevant documents in the set from which the sample was drawn, plus or minus the margin of error of the sample.
e) Recall (TP/(TP+FN)). Recall is the percentage of relevant documents captured by the method you used and is often described as a measure of completeness. It is typically the most important statistic in ascertaining whether your review obligations have been met.
f) Precision (TP/(TP+FP). Precision is the percentage of documents that AI said were relevant, that were truly relevant. It is a metric of efficiency of the AI review.
g) F-Measure/F1 ((2*Recall*Precision)/(Recall+Precision). F-Measure is the average of recall and precision and provides insight into the overall effectiveness of the review. Recall and Precision have an inverse relationship to one another; if you cast a wider net, your recall would improve at the expense of capturing more false positives, reducing precision and vice versa.
3) Validation Options
a) Control Set Validations using Ediscovery AI
i) In the context of validating an eDiscovery review, a control set is a random sample that is representative of the entire review population which is reviewed by a subject matter expert as well as eDiscovery AI so that the results can be compared.
ii) Creating a Control Set for an eDiscovery AI project is relatively straightforward.
(1) Create a search of the document universe you wish eDiscovery AI to run against. Once that is created, create a sample of it using Relativity’s sampling application with the parameters you wish to use.
(3) Once complete, you are ready to apply the eDiscovery AI coding to the sample set. Click on “Send to eDiscovery AI” in the Mass Action menu. Enter the issue prompts using the exact same language what was, or will be used in the full eDiscovery AI review.
iii) Once everything is entered, select “OK” to submit to eDiscovery AI and have the AI application run against the set.
iv) eDiscovery AI will provide for every document and every prompt, 1) a reiteration of the prompt that was used, 2) a decision of whether the document fit that prompt (relevant, not relevant, technical issue or needs further review) and 3) an explanation as to why it categorized the document the way it did.
v) With both the manual coding and eDiscovery AI categorizations completed, you can compare and analyze the results. Using the “Analyzing and Statistics of Note” section above, separate the documents into True Positives, True Negatives, False Positive and False Negatives and calculate recall, precision, etc. accordingly.
vi) Control Set Sampling Best Practices
vii) Control sets are only representative of the set from which they were drawn. If the underlying set changes (more documents are added or removed) then the control set is no longer valid, and a new sample needs to be completed.
viii) Keep in mind the richness when working with control sets. A sample created with a large margin of error that also has a low richness that there will be very few relevant documents to analyze and assess results. You may want to consider increasing the size of the sample to ensure more relevant material is considered, which will provide a more comprehensive and thorough analysis.
ix) It is not uncommon for the manual coding to be incorrect and for the AI to have the document correctly categorized. For that reason, it is a good idea to revisit and QC manual coding decisions.
b) Elusion Sampling with eDiscovery AI
i) An “elusion sample” is sample taken at the end of the project intended to provide final recall statistics and confirm obligations have been met. In the case of an elusion sample, you are not considering what has already been determined to be relevant or not relevant but are assessing what has not been reviewed yet and is assumed to be not relevant.
(1) In this scenario everything is either a False Negative (if found as relevant) or a True Negative (if found as not relevant). Please consult the “Analyzing Samples and Statistics of Note” section if you have questions on these categorizations or statistics derivative of these categorizations.
ii) Create a search of all documents found by eDiscovery AI to be Not Relevant or that were not categorized or reviewed because they were presumed Not Relevant. Once that search is created, generate a random sample from that set using Relativity’s sampling tool with the parameters you wish to use.
iii) Manually review and code the documents within the sample for the same categories that eDiscovery AI reviewed. The main point to remember is that the coding must be accurate, consistent and comprehensive. It is recommended that few people (preferably only one) code the documents accordingly to ensure consistency.
iv) Analyze the recall within an elusion set
(1) If a Control Set had previously been completed recall can be determined by calculating the decrease in richness between the Control Set and the Elusion Sample. In that case, recall can be estimated as (1- (elusion richness/control set richness)).
(2) Recall using only an Elusion Sample can be estimated by taking the richness of the Elusion Sample and multiplying it against the population of documents from which it was drawn for an “Estimated elusion relevant.” At that point, recall can be estimated as ((Total Relevant Found)/(Total Relevant Found + Estimated elusion relevant)).
(3) In most situations, there will be a close correlation between the recall found in a Control Set/Elusion Set Prevalence comparison and the Elusion Sample Recall Estimate.