August 05, 2019

Intelligent Data Extraction Enables Agencies to Shift from Analog to Digital

Intelligent Data Extraction, or IDE, represents the next generation of optical character recognition (OCR) by leveraging the power of the cloud and recent advances in artificial intelligence (AI) and machine learning (ML). It means taking data from the ordinary to the extraordinary. In reality, organizations not only have access to, but create a wealth of data on a daily basis. It is widely accepted that only 20% of the millions of pieces of data organizations have are in a structured format. What this means for federal agencies is that up to 80% of their data is either analog or in an unstructured format. Agencies are not able to use software to analyze this data—or easily use it in any way to improve its operations or service delivery. OCR can take this unstructured data, whether handwritten or paper, and convert it to useful information to improve services.

This approach is a stepping stone, but not a perfect solution.

Federal agencies may not be able to completely do away with handwritten data. As much as digital is pushed and implemented, paper and handwritten documents still hold a proportion of all data. Rather than having two different streams of digital and paper data coming in, with the latter taking up space, large costs, and collecting dust in a warehouse, IDE provides a means to extract and integrate analog data into structured repositories. All the while, keeping original document images and archiving them in the cloud.

Intelligent Data Extraction goes deeper, extracting analog, cursive, low scanned quality, poor handwritten, crossed out, and imperfect data, and converts it into usable data, thus combining the best of both human and machine. Like many AI tools, IDE is an enabler of the human workforce; therefore, it is critical that employees carefully define and configure what data they want the system to provide. Then, with the power of the cloud, machine learning algorithms learn to work smarter and faster. The end result is less manual, repetitive and low-value work for employees, which in turn fosters an environment focused on higher value tasks, and opens up opportunities for innovation.

Structured, semi-structured, and unstructured data

Broadly speaking, an agency may have documents that fit into the three format categories - structured, semi- structured, and unstructured. Current cloud services can adequately handle structured and semi-structured documents only. A great example of a structured document is any document that follows a defined template such as a form. Semi-structured documents don’t follow a pattern or template but contain common data elements, such as an invoice. Though no current solution exists that can accurately and fully tackle unstructured documents, including key information such as in-form fields, supports IDE viability.

Many agencies are considering RPA and AI initiatives; however, these tools require access to high-quality structured data. To share a football analogy, without structured data, a data team must go out onto the field with their star athlete only performing with 20% of their abilities. Data extraction allows for enhanced organizational performance at the fullest potential leading to data-driven decision making to benefit the customer. As agencies are undergoing digital transformation, from analog to digital, IDE can serve as a key-enabler and a bridge between the two worlds. The beauty of this approach is that it accepts the current realities of paper-based submissions, leverages the latest advances in artificial intelligence, cognitive sciences and cloud computing to deliver operational savings and better decision making.

The importance for agencies to leverage IDE is even more profound as several key mandates are forthcoming. First, the National Archives and Records Administration’s (NARA) electronic records management mandate requires agencies shift to digital-only records submissions by 2022. In addition, while the 21st Century Integrated Digital Experience Act, requires digitization of non-digital and paper-based government services, it does not do away with paper forms and documents entirely. Agencies will have to adapt to effectively operate in a mixed analog-digital world, at least on the front-end. The Cloud Adoption Center of Excellence (CoE) is assisting agencies in implementing IDE solutions to reduce manual workloads, digitize paper records and enable automated processing of paper-based forms such as claims and applications.

To learn more about our Cloud Adoption offerings, contact us at connectcoe@gsa.gov.