Batch Analysis (Document Triage)

Batch Analysis projects are great for whenever a user is not sure what types of documents they have, and wants to perform identification and triage of documents before organizing them and sending them to a Contract Analysis project.


Clusters Tab

Clustering provides a visual aid for users who don’t yet know exactly what data they have in their documents, or even what kinds of documents they have. After uploading documents into a Batch Analysis, a user will be able to view the documents in that project arrayed as a series of clusters.

BatchCluster

Each document is represented by a dot in the cluster array. Dots will appear in the same color if they are significantly related to others of the same color, based upon the Data Entity chosen in the “Cluster by” drop-down form. You can click any dot to open that document in the main viewing pane. The centroid cross within the cluster, meanwhile, represents the mathematical mean of the similar documents that have been clustered together. The following Data Entities can be chosen to cluster by:

  • Text

  • Term: Default. Clusters based on standard terms, or any Custom Term Sets used for the project.

  • Currency name (Dollars, Euros, Yen, etc.)

  • Currency value

  • Date

  • Definition: Clusters based on the defined terms found in documents.

  • Duration

  • Party: Clusters based on all named parties/companies/legal entities in documents.

  • Geoentity

You can also choose multiple Data Entities to further cluster your documents. To do this, select Data Entities from the “Cluster by” drop-down, and/or click the small “x” button to the left of a chosen Data Entity to remove it.

Controls for clustering:

ClusterOptions

1. Select Clustering Method: There are three different mathematical methods users can implement for clustering. Each method provides a slightly different picture of the data. The outbound links listed here provide more information on these three clustering methods:

2. Cluster by: Choose which Data Entities to use for calculating clusters. You can select more than one Data Entity to produce more specific clusters. See above for the list of “Cluster by” options.

3. Clusters: Choose the number of clusters to be calculated. Each cluster will be assigned a different color. Changing this setting can be especially useful if you know, for example, that you have documents from four separate countries, or that you have three different types of documents, or that you have eight different clients. This is also important when choosing how and where to send clusters to Contract Analysis projects.

4. Cluster/Re-Cluster: Click this button to re-cluster documents after changing your settings. This may take a second, or close to a minute, depending on how many documents are in your Batch project.

You can open any document directly from the Clusters tab by clicking a dot.

BatchClusterHover

Editing Cluster Properties

Once your documents are clustered the way you want, you can rename clusters by clicking the cluster name, or the “edit” icon, in the right pane.

BatchClusterEdit

Enter a new name for the cluster and click the green “checkmark” (or click the red “x” to revert).

BatchClusterRename

Sending Clusters to Contract Analysis Projects

After clustering your documents, you can easily send an entire cluster to an already existing Contract Analysis project. Find the cluster you want to send, and click on it. A blue border will appear around that cluster’s information.

BatchClusterMove

Next, use the drop-down menu at the top right of the screen (top center, above) that says, “Click to select”. All Contract Analysis projects that you have access to will be listed in the drop-down. Select the cluster you wish to send, and click “Send” to send those documents from the Batch Analysis and into the chosen Contract Analysis project for more in-depth review and training.

Once your Contract Analysis project has all the documents you want to send to it from the Batch Analysis, you can start reviewing those documents.


Documents Tab

Clicking the “Documents” tab in the top left of a Batch Analysis Project will take you to the Document Grid View.

BatchDocGrid

Documents are listed in the Grid with different columns representing Data Entities, including a column for Cluster ID with a matching colored circle. In the top right of the Documents Grid, there are several options for customizing the Grid.

BatchGridSettings

  • Filter: This option allows you to filter the Documents Grid using individual searches in any of the visible columns. For example, typing “< 5000000” in a Currency Field will show only those documents that have extracted currency values of under 5,000,000. Use in combination with Column Visibility to conduct targeted searches within an entire project.

  • Export: You can export a spreadsheet of all the data contained in the Documents Grid, based on which Filters and Columns you’ve chosen to display. There are 3 export options:

    • Microsoft Excel 2007-2013 XML (.xslx)

    • Text CSV (.csv)

    • Export Selected Cluster(s) (.zip): Export selected documents into a .zip folder.

  • Column Visibility: Change which Fields are visible in the Documents Grid. Use in combination with Filters to conduct targeted searches within an entire project. The following columns are selectable for all Batch Analysis projects, in the “Default Fields” or “Additional Fields” section of the Column Visibility pop-up:

    • Name

    • Cluster ID

    • Title

    • Is Contract

    • Parties

    • Earliest Date

    • Latest Date

    • Largest Currency: Amount

    • Document ID

    • Definitions

    • Currency Types

    • Geographies

    • OCR Rating

    • Folder

    • Language

    • Original Doc Link

    • Contract Type

    • Contract Type Top 3

    • Load Date

    • Loaded By


Annotator

Clicking on a document’s name in the Batch Documents Grid will open that document in the Annotator. The document’s text is displayed, with passages of text highlighted based on standard extracted Data Entities. Many of these Data Entities are also available as clustering options:

  • Citations (under development)

  • Currencies

  • Dates

  • Date durations

  • Definitions

  • Geographies

  • Parties

The Annotator displays the highlighted text from which ContraxSuite has automatically extracted these standard Data Entities. The blue highlights in the main pane show where the data was found in the document, while the index on the right pane shows which Fields extracted the data. The image below shows a document in the Batch Annotator.

BatchNoColors

In the top right, the Batch Annotator screen defaults to showing “Quick Data” in the right pane. The “Quick Data” pane contains an index of the Data Entities listed above. Clicking on a term with a blue underline will open a pop-up displaying the Definition of that term.

Viewing Tabs

There are four main viewing tabs in the right pane of the Batch Analysis screen:

BatchTabs

1. Quick Data: Default. This is an index of standard Data Entities in the document:

  • Citations (under development)

  • Currencies

  • Dates

  • Definitions

  • Durations

  • Geographies

  • Parties

2. Definitions: This tab displays a list of defined terms within a document, extracted from the structure of the document. This includes any internal lists of definitions contained within the document itself. These definitions can also be viewed within the Main Annotator Pane, by clicking the blue line underneath any instance of that defined term.

3. Section Navigation: This tab displays a document’s internal sections and sub-sections. Users can quickly navigate to different sections in the document by clicking on a section or sub-section in this list.

4. Search: This tab allows users to conduct a simple search within the document for specific words or phrases.

BatchSearchPane

Users can choose to search via regular text, or via regular expressions, by clicking the respective checkboxes (see image above).

Display Options Panel

At the bottom right of the Annotator pane, there is a Display Options Panel:

BatchIconsPanel

1. Text Highlights On/Off: By default, users will see highlights in the Annotator for any text unit that has a value extracted from it. If you wish to hide these highlights for all of the text in the document, click this icon to toggle highlights on or off.

2. Color-Coded Entities On/Off: Enable or disable color-coding of Data Entities.

BatchColors

Each color is assigned by default, but you can change the color associated with a Data Entity by hovering over the color circle next to the Data Entity in the Quick Data tab of the right pane.

BatchColorPicker

3. Keyboard Shortcuts On/Off: Click this icon to show a list of keyboard shortcuts available in the Batch Annotator. To hide this panel, click the “X” in the upper right corner.

HotKeyShortcuts

4. Annotator Zoom: Click this icon to display a pop-up list of zoom options:

  • Automatic Zoom. Default

  • Actual Size

  • Page Fit

  • Page Width

  • Page Height

  • 50%

  • 75%

  • 100%

  • 125%

  • 150%

  • 200%


Click the “Next” button below for instructions on how to use the Contract Analysis project interface.