## Document Fields ContraxSuite Document Fields can be written and configured to find and extract many different kinds of data from sentences, paragraphs, and whole sections of documents. Through the creation of unique Document Fields, users can see the data in their documents that is most vital to them. This data gathering is what enables Admins, Power Users, Project Managers, and Reviewers to determine whether they have the data they need, and whether it's in the right place. A Document Field can be configured to find any kind of data, from simple calendar dates to complex clauses that require machine learning and model building. Each Document Field has a **Type**, which guides the system when searching for the right value. (Examples of Field Types include: Address, Choice, Company, Date, Duration, Percent, Geography, *etc*. See the "Type" section below for more.) After setting up a Document Field, an Admin or Power User can then write Document Field Detectors for each Field. Field Detectors direct the system toward the sentence, paragraph, or section of the document in which the value being sought is located. Document Field Detectors find the correct values for each Document Field via the following techniques: * Defined words, terms, and phrases that LexNLP - the legal-specific dictionary - can identify based on format and context. Examples of these terms or words include words in quotations, words in parentheticals, and/or words that are near grammatical markers such as "means". * Field Types such as percents, durations, currencies, and geographies all follow recognizable patterns. ContraxSuite uses [**regular expressions**](https://en.wikipedia.org/wiki/Regular_expression) to detect these sequences of symbols and characters. **Note: A Document Type cannot have more than 400 Document Fields** --- #### How to Create a Document Field There are two ways to create a new Document Field in ContraxSuite. The first method: **1.** Go to **Management** in the main menu (left pane) and click on **Document Fields** ![AddField](../../_static/img/guides/PowerUsers/AddField.png) **2.** On the Document Fields Configuration grid, click on **Add Document Field** in the upper right corner. ![AddField2](../../_static/img/guides/PowerUsers/AddField2.png) **3.** Use the guide in ["General"](./create_document_field.html#general) below to complete the new Field based on the specific Field Type that you would like to create.
The second method for creating a new Document Field: **1.** Go to **Management** in the main menu (left pane) and click on **Document Types** ![OpenDocTypeGrid](../../_static/img/guides/PowerUsers/AddDocType.png) **2.** in the Document Types Configuration Grid, click the Document Type you'd like to add a Field to. The Document Type's edit page will open. Scroll to the bottom of this page, and click **Create New Document Field**. ![DocFieldFromDocType](../../_static/img/guides/PowerUsers/AddFieldFromDocTypePage.png) **3.** A pop-up modal window will appear on your screen. Use the guide in the ["General"](./create_document_field.html#general) section below to complete the new Field based on the specific Field Type that you would like to create. ![AddFieldModal](../../_static/img/guides/PowerUsers/AddFieldModal.png) --- #### General The Document Field creation page/pop-up modal will display a number of forms to be filled out. The first four are general information, and are **required** for any new Document Field: **Title**: This is the Document Field's display name. This Title is what users will see in the UI. For ideal visibility, the shorter the name, the better. Longer names will only be fully visible in grids via mouse-over. **Code**: Enter a short reference Code for this Document Field. This Field Code will be utilized in the system backend, and in the system Admin Site. Unlike Field Titles, Field Codes must be *unique*. Having unique Field Codes allows Admins and Power Users to create formulas and calculated Fields referencing that specific Code. Field Codes are not only unique, but they also cannot be changed once a Field is saved. This allows users to change a Field's Title later without affecting any formulas that reference the Field's Code. A Document Field Code must meet the following criteria: * No uppercase letters * 50 characters or less * Contains only Latin letters, digits, and underscores. **You cannot have spaces in Codes.** Use an underscore "_" instead of spaces to separate words in a Code (*e.g.*, "document_field_code_one") Deleting a Field will delete that Field Code from the system and allow you to re-use it for a new Field. You can also [clone a Field](./create_document_field.html#cloning) in order to make a copy with a different Code. **Type**: Choose the new Field's Type from the drop-down menu. The Field Type specifies the type of data object the system will look to extract for the Field. Different Document Field Types require different settings to function. As with Field Code, Field Type cannot be edited once the Field has been saved. To change a Field Type after creating a Field, you will need to either [clone the Field](./create_document_field.html#cloning), or create a new Field. The following list describes each Field Type: * **Address:** Extracts a street address. * **Choice:** Extracts data and returns one of a set of pre-written value options (*e.g.*, a "color" Field might have the options *red*, *yellow*, *blue*, and *green*). [Field Detectors](./create_field_detectors) should be set up for Choice Fields in order to search for specific attributes of the data and then return the applicable value from the list of choice options (*e.g.*, a Field with a Field Detector that has regular expressions for detecting the words "red," or "yellow," or "blue," or "green"). * **Note:** Entries for Choice options cannot contain commas, or spaces that are not between words. * **Company:** Extracts a company name. Can also be used to extract the parties to an agreement (*e.g.*, a Company Field for landlords, a separate Company Field for tenants). * **Date: Non-Recurring Events:** Extracts a date. This Field Type is best for most date scenarios, such as start date, effective date, *etc*. * **Date: Recurring Events:** Extracts a recurring date. For example, the sentence "Rent is due on the fifteenth of every month," would be extracted for a recurring Field, and typed so that the value can be placed in a calendar. Useful for due dates, such as rent payments in leases. * **Duration:** Extracts a duration, which will be converted and stored as a number of days, for easier data manipulation and proper numeric comparisons (*i.e.*, "3 months" is converted to "90"; "5 years" becomes "1825"). Useful for term lengths of contracts. * **Floating Point Number:** Extracts any number (*e.g.*, "The loan will be made with an interest of 3.25%" will result in an extracted value of "3.25"). * **Geography:** Extracts a country, state, city, *etc*. Useful for finding, *e.g.*, "governing law" clauses. * **Integer Number:** Extracts integers (*i.e.*, whole numbers, numbers without decimals). **Floating Point Number** will also extract integers, but it is recommended that users choose the most narrow Document Field possible. If you know the value will always be an integer (*e.g.*, number of children), then you should use Integer Number. * **Money:** Extracts amounts of money in the following forms of currency: US Dollars (USD), Canadian Dollars (CAD), Australian Dollars (AUD), Euros (EUR), British Pounds (GBP), Japanese Yen (JPY), and Chinese Yuan/Renminbi (CNY). * **Multi Choice:** Similar to the **Choice** Type, except Multi Choice Fields allow for multiple options to be selected. For example, a lease might have "retail," "farming," "storage," and "residential" as options, and a specific lease document specifies both farming- and storage-related attributes. Specific Field Detectors would then need to be set up to search for particular criteria for extracting data for both "retail" and for "farming". *See [the Field Detectors page](./create_field_detectors) for more*. * **Note:** Entries for Multi Choice options cannot contain commas, or spaces that are not between words. * **Percent:** Extracts percent values, including values expressed as basis points (bps). Values are stored up to two decimal places. * **Person:** Extracts a person's name. Often used to extract, *e.g.*, employee names from an employment agreement. * **Ratio:** Extracts ratios, and displays them as "amount 1" / "amount 2". * **Related Info:** This Field Type allows for unlimited selection of multiple clauses and provisions. Related Info Fields are often used in scenarios where users simply want to know whether data is or is not present. This Field Type is also employed for finding large sections of text, particularly multiple separate but related sections within long documents. An example of this would be a Field for "Intellectual Property Rights": IP rights may encompass multiple passages and clauses, from different sections of a document, none of which may be adjacent to one another. [Field Detectors](./create_field_detectors) for a Related Info Field will need to be written to specify the different variations of all passages of text that users want extracted and annotated. * **String (vectorizer uses words as tokens):** An example of when a String Field is useful might be for extracting a land PIN. The PIN will always occur after the exact phrase "HRM Land Registry Pin:", so to capture the PIN itself you would write regular expressions in the "Dependent Regexp" form for the Field based on the phrase "HRM Land Registry Pin:" and extracting the next X characters after that phrase without extracting the text "HRM Land Registry Pin:" itself. This Field Type can also be used when users don't necessarily want to extract any data, and just want to type in a text value manually after annotation. For more on this, see ["Field Detection Rules"](./create_document_field.html#field-detection-rules) below. * **Long Text:** This Field Type should only be used if **nothing** is being extracted, but users want to be able to type 3 to 5 lines of additional explanatory text on an annotation. This Field Type is often used for situations when users want to write brief summaries of contracts. After you have selected a **Type**, you will have to select a "Document Type" for this Field to be associated with. **Document Type**: The last form that requires a value before the user can save the Document Field is the Document Type. Select from the drop-down which [Document Type](./create_document_type) this Field will be included in. If you need a Document Field to be assigned to a different Document Type after it's already been created, you will need to create a new Document Field, or [clone an existing Field](./create_document_field.html#cloning). --- #### Additional Forms Descriptions of the remaining Document Field forms are below. Depending on what type of Field you are creating, you may not need to fill out every form described in this section and sub-sections. **Category**: A Field's Category allows a user to organize Fields into separate pages in the Field Values tab of the Annotator screen. Choose a category for a Field from the drop-down menu. This Field will then appear on that Category's page in the Field Values tab. **Note**: Categories exist at the Document Type level. To add Categories to a Document Type, [click here for instructions](./create_document_type.html#managing-document-types). ![DocFieldCategory](../../_static/img/guides/PowerUsers/DocFieldCategory.png) **Description**: Include a description of the nature of the Field, and how users should expect to interact with the Field. Mousing over the Field in the Field Values tab of the Annotator will display this description as a tooltip. **Requires Text Annotations**: Check this box if the Field should always have an annotation highlight for the text unit (sentence or paragraph) in which the Field's value was found. *Checking this box is recommended*. **Display Yes/No**: *For Related Info Fields only*. If this box is checked, the word "No" will be displayed for the Field if the Field extracts nothing. If phrases are extracted by this Field, The word "Yes" will appear, along with a numbered indicator that can be used to navigate between all the extracted clauses.
###### "Choice" and "Multi-Choice" Fields Only The following options will only appear for Choice and Multi Choice Field Types. **Choices**: Type the value of each option you want available for this Choice/Multi Choice Field. Each option should be on a separate line inside the form. For example: Blue Red Green Yellow These values will always appear in alphabetical order in the Field Values tab of the Annotator. **Allow Values Not Specified in Choices**: Check this box if you would like users to be able to type in custom values for this Field. Taking the color example from above, checking this box would allow a reviewer who finds "purple" or "orange" in a document to manually type that value into the Choice/Multi Choice Field.
###### Field Detection Rules **Value Detection Strategy**: From the drop-down menu, choose a method for detecting values and extracting data objects. The following is a list of all available value detection strategies. The first three are the most commonly used: * **Field Detection Disabled:** This option disables all automated data extraction related to this Field. This is useful for testing, or for projects that are mainly centered around manual review. This strategy should not be used for Fields marked ["Read Only"](./create_document_field.html#advanced), as users will be unable to populate the Field with data. * **No ML. Use Regexp Field Detectors:** *Default selection.* When this choice is selected, you will need to write separate instructions called [Field Detectors](./create_field_detectors), using either definition words or regular expressions to identify and extract the desired data into this Field. If there is no regexp written for this Field, then the extraction function will find values for this Field based on the Field Type (*e.g.*, Company, Date, Person) using LexNLP extractors. *Note: Field Detectors can be used to specify multiple desired values for extraction. As the name implies, there is no machine learning for Fields that use this detection strategy.* * **No ML. Use Formula Only:** The system executes a user-generated formula written in `python` syntax, rather than using regexp, to calculate the value(s) for this Field. As the name implies, there is no machine learning for Fields that use this detection strategy. More information on calculated fields can be found below in the [Field Detection: Calculated Fields](./create_document_field.html#field-detection-calculated-fields) section, or on the [Writing Formulas](./writing_formulas.md) page.
* **Use Multi-Line Field Detectors:** *Formerly "Use regexp pattern: value collection"*. Choose this option if you are creating a Choice or Multi Choice Field with a large list of potential options that are stored in a separate file. See the relevant [section of the Field Detectors page](./create_field_detectors.html#field-detection-for-use-multi-line-field-detectors-strategy) for a detailed breakdown of how to set up this Value Detection Strategy. * **Use pre-trained text-based ML only:** Detection works similar to "No ML. Use regexp Field Detectors" except machine learning is used to find the text units related to the Field. The machine learning model needs to be pre-trained manually via [Admin Tasks](../doc_exp/admin). The model is trained on the Field values entered by users, and by the data detected by machine learning and then confirmed by users. * **Use pre-trained MLflow model to find matching text units:** Currently, the only Field Type that supports MLFlow integration is Related Info. Contraxsuite has the ability to use [MLFlow models](https://mlflow.org/) in the field detection system. This allows users to develop and train machine learning models for field detection in a lightweight Conda development environment separate from their ContraxSuite instance, and then deploy those ML models into Contraxsuite when ready. MLFlow models are developed in IDE in separate projects. Each training run is recorded in the local MLFlow database or in a remote tracking server. MLFlow models can have dependencies fully separate from Contraxsuite. * ***Apply regexp field detectors to depends-on field values:*** *Works similar to "No ML. Use regexp field detectors" except Field Detectors are applied to the values of the other "depends-on" Fields, converted into strings. Field Detectors are run on values of the specified "depends-on" Fields, rather than on a document's text units (sentences, paragraphs, and sections). When a Field value matches the regexp, the extraction function is used on the matching Field value to find the value of the resulting Field.* **Note: This option is no longer available as of major release 2.0** **Text Unit Type**: Select the size of text unit the Field will extract and annotate when its Field Detector finds the target value in a document. Users can select "Sentence" or "Paragraph". Select the Text Unit Type that best fits the needs of this particular Field. The default Text Unit Type is "Sentence". **Depends On Fields**: *Only for formula-based Value Detection Strategies*. Use this two-column form to select Fields that this Field requires in order for formula calculations to work. The list of Available Fields will align with the chosen Document Type for this Field. Move Fields from the left side of the "Available" box into the "Chosen" box on the right side. **Dependent Regexp**: *Only appears for String or Long Text Fields*. The regular expression entered in this form will be run on the sentence found by the Field Detector(s), in order to extract a specific string value from the Text Unit. If the regular expression returns multiple matching strings, then the *first* matching string will populate the Field. The following is an example to demonstrate this functionality: * **1.** You know that all PIN numbers in your contracts use the following format: the phrase "Location PIN:" followed by 9 dights and 3 letters, with no spaces * **2.** You know that all PIN numbers are preceded by the phrase "Location PIN:", such that they appear like the following: "Location PIN: 123456789PLX" * **3.** Your Field Detector should have the following in its ["Include regexps" section](./create_field_detectors) (*Note: The code "\s{1,5}" accounts for variable spacing between words*): Location\s{1,5}PIN * **4.** The Dependent Regexp you need in order to extract the 9-digit, 3-letter PIN number while leaving out the phrase "Location PIN:" would be the following: (?:Location\s{1,5}PIN:)\s{1,5}(\d{9}\w{3}) For more on regular expressions, including how to write basic commands, visit [RegExr](https://regexr.com/).
###### Field Detection: Calculated Fields This section of configuration options will only appear if you have selected a formula-based Value Detection Strategy. **Formula**: Enter the formula for this Field. All Fields used as part of a formula must be in the same Document Type as this Field. Here is an example scenario: * You have a Duration Field for the length of a contract: `contract_length` * You have another Duration Field for the length of a contract extension: `extension_length` * You create a third Field, an Integer Field, to calculate the full length of the contract after extensions: `total_length` You will want to write a formula like the following: total_length if total_length \ else (contract_length + extension_length) if contract_length and extension_length \ else None The first line checks whether there is already a value in `total_length`. If so, the calculation resolves, and that value displays. The second line performs a calculation, adding the values of `contract_length` and `extension_length` if both of those Fields contain values. The third line resolves the formula and populates no value if `contract_length` or `extension_length` don't have a value (failure on Line 2), and `total_length` doesn't already have a value (failure on Line 1). For more on writing formulas, go to the [Writing Formulas](./writing_formulas.md) page. **Convert Decimal arguments to float in formulas**: Check this box to use extra decimal places for formula calculations, for more accurate results. Floating Point Field values are represented in Python Decimal type to avoid rounding problems. Check this box to maintain these extra decimal places in calculations.
###### Advanced The forms and options that appear in the collapsible **Advanced** panel on the Document Field page are optional, and can be used to additionally configure Fields in various ways. **Confidence**: Indicates the user's level of confidence in the success rate of the Field Detectors (or machine learning, if it is an ML-based Field) when they identify values for the Field. This attribute is somewhat subjective, but can be useful for when Admins and Power Users want to communicate a certain confidence level to reviewers. * **High**: Reviewers can be confident that the correct value will be identified, extracted, and populated into the Field. * **Medium**: Moderate confidence. The correct value is expected to be identified, extracted, and populated into the Field about half of the time. * **Low**: Minimal confidence. Reviewers should not rely on the system to identify, extract, and populate the correct data into this Field. **Read Only**: Check this box to make it so this Field's value cannot be modified in the Annotator. **Default Value**: If you would like a Choice, Multi-Choice, or String Field to be populated with a specific value by default, enter it here. This value will appear if the Field's Detectors extracted nothing from a document after it was uploaded and parsed. **Hidden Always**: Check this box if you do **not** want this Field to be displayed in the Annotator. Typically, Admins and Power Users will check this box when creating a Field for use in a formula. In such a case, the data in this specific Field does not necessarily need to be displayed, only included in a calculation and kept hidden until the formula from another Field is calculated and displayed. This is also useful when an admin or power user wishes to archive a Field that contains useful data for the project, but no longer needs to be displayed. **Hide Until Python**: Enter a `python`-based command that indicates the conditions/dependencies under which this Field should be displayed. For example: if field_a is not None and field_a == "5" and field_b is True In this example, if the Field that has the code `field_a` contains a value of 5, *and* the Field with the code `field_b` is "True", then display this Field and any values it contains. For more on writing Python code and creating Field formulas, go to the [Writing Formulas](./writing_formulas.md) page. The way to write these "Hide until python" conditionals depends heavily on what type of Field is used. In the example discussed above, `field_a` is a Related Info Field. For Related Info Fields, values should be expressed as "is True" or "is False". If `field_a` was a Choice or Multi-Choice Field with "Yes" or "No" values, the code would need to be written like this: `field_a is not None and field_a=='Yes'` Writing **Hide Until Python** Formulas: * All Field Codes used in the "Hide Until Python" form must be part of the same Document Type. * The output of the "Hide Until Python" must be a True or False value. * You must type the exact Document Field code. * "Hide Until Python" can interfere with page loading if it does not account for null/None/empty values properly. You must always check that a value "is not None" before you can perform other operations on that Field. It is recommended you test the document page regularly while writing "Hide Until Python" code. * Use `is` to check whether a Field has a value: `is None`, `is not None`, `is True`, `is False` * Use `==` to check a Field for a specific value: *e.g.,* `duration_field == '45'` For more on writing formulas, go to the [Writing Formulas](writing_formulas) page. --- #### Cloning Cloning is useful for creating a Field that is identical to, or similar to, a Field that already exists, or you may want a Field from one Document Type to be included in another Document Type. To clone a Field, follow these steps: **1.** Go to **Management** > **Document Fields** to open the Document Fields Configuration Grid. ![AddField](../../_static/img/guides/PowerUsers/AddField.png) **2.** Find the Field you want to clone in the Grid, and click it to open its edit page. ![FieldGrid](../../_static/img/guides/PowerUsers/FieldGrid.png) **3.** At the top of the Field's edit page, click the "Clone" button. ![CloneButton](../../_static/img/guides/PowerUsers/CloneButton.png) **4.** A pop-up window will appear, with a drop-down menu for you to select which Document Type this Field clone will belong to, as well as a form to enter a unique Field Code for this new Field clone. ![CloneModal](../../_static/img/guides/PowerUsers/CloneModal.png) **5.** Click "Save" once you've filled out these forms. You will be returned to the Document Fields Configuration Grid. From there, you can use the Grid's filters to find your new Field, and edit is as usual. --- After creating Document Fields, visit the [Field Detectors](./create_field_detectors) page to learn about writing Field Detectors and utilizing regular expressions (regexp).