Automatically extracting processable data from business documents, whether they have the structure of a form or not, is key to business efficiency. Costs associated with manually converting documents to data are significant, and human error rates guarantee a level of expensive rework. Industries have standards for electronically exchanging data, but the market, for reasons known and unknown, still relies on exchanging documents, forcing every business to bear the cost of data entry or implement document automation to control those costs. That’s where machine learning comes in.
Kofax Transformation is the market-leading document automation software solution, available by itself or as part of the Kofax Capture and Kofax TotalAgilityTM platforms. It applies machine learning to build continuously improving extraction models that deliver genuine business cost savings.
Automated learning is a developing technology, so in all but a few cases, machine learning requires a human to teach. Teaching in Kofax Transformation works two separate ways: administrators initially, and often regularly, train the system using Transformation’s administrative toolkit and end users extend the training using Transformation’s Online Learning facility.
The eponymous 80-20 rule applies to every document automation application. 80 percent of processed documents come from 20 percent of the sources. New applications can achieve an elevated level of automation performance by creating a training set of documents from the most active sources and administratively train the system with those examples.
But what about the other 20 percent? Transformation detects unrecognized document formats and queues them for user review. Using Online Learning, users then select the document type, find the beginning and end of the document for separation, and select from where to extract data elements. Once the unknown document type goes through the Online Learning process twice, Transformation begins recognizing it. Future documents of that type now process automatically and only queue for review when automation confidence falls below acceptable thresholds.
Transformation stores what it learns in two separate ways: Extraction Sets and Knowledge Bases. Administrative training updates an Extraction Set. Extraction Sets can convert to Knowledge Bases, which can, in some instances, improve system performance. One Knowledge Base advantage is portability. Unlike Extraction Sets, they can exist in multiple Transformation projects, a helpful characteristic when different business applications process similar documents.
Teaching coming from users utilizing Online Learning updates a Dynamic Knowledge Base. Dynamic Knowledge Bases perform the same as any other Knowledge Base, although there is a limit, for performance reasons, to the number of training documents they hold. System administrators regularly incorporate machine learning into the permanent Extraction Set and Knowledge Bases. Then a new round of Online Learning begins.
Online Learning helps document automation applications get better with time. However, Transformation applications do require keeping a close eye on automation performance and do require regular administrative attention. But with those few simple actions, document automation always lowers business costs, reduces response times, and improves error rates.