Drafting Effective PII Prompts

Drafting Effective PII Prompts

Writing Effective Prompts for PII Detect and Extract

Overview

This guide covers best practices for creating custom prompts to detect and extract personally identifiable information (PII) from documents using eDiscovery AI. Well-crafted prompts ensure accurate detection and extraction of sensitive information while minimizing false positives.

Creating Custom PII Prompts

Naming

Keep the names short and simple since these will be fields in Relativity and long field names or those that include special characters, Boolean operators, or extra spaces can cause Relativity errors. 

Basic Structure

Your prompt should include three key elements:

  1. A clear definition of the PII type. Use simple, plain language to describe the personal information, additional complexity can be added later if needed. For example, a driver’s license number prompt can be as simple as “A driver’s license number for any individual from any US state.”
  2. Common formats and variations if not widely known. A social security number for example has a well-known format and wouldn’t need to be specified. A company specific employee id number on the other hand likely isn’t well known and adding formats and variations will help.
  3. Negative criteria or examples. If there are specific exclusions or items you don’t want included, spell those out. For example, “A driver’s license number for any individual from the state of Minnesota, do not include driver’s license numbers from any other state or country.”

Validation and Testing

Initial Testing Process

  1. Select a small sample set of documents (100-200) containing known PII examples
  2. Run PII Detect and/or Extract with your custom prompt
  3. Review results for:
    • Missed PII instances (false negatives)
    • Incorrect identifications (false positives)
    • Partial or incomplete matches

Refining Your Prompts

  • Start simple and adjust based on test results
  • Add missing variations discovered during testing
  • Include additional context if experiencing false positives
  • Document specific examples of successes and failures

Best Practices

Do:

  • Start with simple, broad definitions
  • Review initial results and make any necessary refinements
  • Test across different document types

Avoid:

  • Overly specific patterns that miss variations
  • Overly complex and detailed definitions or limitations
  • Ignoring regional or industry-specific formats

Examples

·         This is the driver’s license number for an individual. We are only interested if the driver’s license is from Minnesota or Wisconsin.

·         This is the ACME Inc. account number associated with an individual. This account number will always begin with one of the following letters: A, B, C, or D. It is followed by a sequence of 6 to 8 digits. Some of the digits may be masked or redacted with X''s (e.g., A12XXX78) or asterisks (e.g., B****567). Examples of valid formats include A1234567, B1X34567, and C***4567. Ensure that all variations with the prefixes A, B, C, or D are included.

·         This is information directly associated with the employee’s performance, feedback, or other HR-related details.

If after review, more detail is needed, something like “Include any listed job titles. Do not include addresses or phone numbers” can be added.
    • Related Articles

    • eDiscovery AI PII Detect

      PII Detection 1. Use eDiscovery AI PII Detect tool to review documents and identify any that contain personally identifiable information. The types of PII considered in each review can be selected and adjusted by the user. a. To conduct a PII review, ...
    • eDiscovery AI – PII Extract

      PII detection and extraction are often some of the more expensive, time consuming and labor-intensive aspects of many document reviews. In addition, with Data Breach reviews becoming more prevalent and wide ranging in the legal technology industry, ...