Blogs

Healthy Classification and Extraction Models are Key to Performance in Tungsten Automation Transformation Projects

Written by Arsen Ashchyan | Sep 22, 2022

You finally completed the document automation project. The model is trained, the rules are in place, the validation routines are validating, and documents are running through the system. So, all is well, right?

Not for long. Maintaining your classification and extraction model is an often-overlooked task vital to ensuring you get the expected business benefits from your automation effort.

Keeping on top of model performance is particularly critical during the first few months of deployment. The model, as one would expect, still is fluid and not yet fully formed. Documents arrive that were not part of the training set, operators conduct training, and trained documents do not perform as predicted.

First, it is absolutely necessary to constantly and consistently measure automation system performance. Tungsten Automation (formerly Kofax) analytics deliver a broad set of measurements broken down by the performance of each document type and the performance of each extracted property. Benchmarks captured during initial testing and production deployment define a baseline for determining whether performance is getting better or is getting worse.

User-directed training, or as Tungsten calls it Online Learning, enables the production staff to extend the model over time. However, only seasoned staff members who deeply understand the automation model design, the automated document types, and the automation requirements should have the authority to conduct that training. Inadvertent introduction of false data into the model can make earlier training ineffective.

Experience with the automated learning feature shows that it can exponentially increase model size, potentially reduce performance, and create situations that are difficult and laborious to correct. You should limit the use of the feature only to situations where its behavior is predictable and desired.

Documents in any business application come from diverse sources and can differ greatly. When business analysts and subject matter experts responsible for maintaining the automation understand the model design, and how the various learning systems affect it, they can determine when specific document types are good candidates for online learning or if a more tailored configuration is required to ensure proper classification and extraction performance.

Continuous performance measurement and model management must be a part of every document automation effort. When you track system performance over time, you have deep visibility into how the model is performing, what can and should be done to improve it, and, heaven forbid, can quickly discover when bad things happen.