What Is Indexing for Document Scanning & Digital Mailroom?

Thursday, August 24, 2017

Many people are familiar with document scanning, whether it’s part of a backfile scanning project or for ongoing digital mailroom scanning. But how do you find these documents after they’re scanned?

The lesser known practice of “document indexing” is the answer, but what is it and why should you care?

To begin, there are two types of indexing: metadata and full-text.

Metadata

Indexing serves as metadata (aka keywords) for over document scanning. Typically, our document scanning clients provide a manifest of what needs to be scanned and what types of documents need to expect. We then identify the right mix of metadata that will help to serve as unique identifiers. Metadata indexing examples include:

  • Invoice, PO, waybill, and work order number
  • Employee name, employee number and social security number
  • Student name, ID, school, and social security number
  • Patient name, doctor and social security number
  • Date
  • Site ID
  • Any other unique identifier

Barcodes can help to automate metadata indexing to eliminate manual data entry (and subsequent mistakes).

Full-Text Indexing & The Role of OCR

Full-text indexing refers to when optical character recognition (OCR for machine print) or intelligent character recognition (ICR for hand print) is used to index all or part (zonal) of documents scanned.

OCR can happen at scan time or post-scanning. The former slows down scanning by 30%. The latter can be done as a function of the document capture system (software that drives the scanner) or in your content/document management system, and is performed faster on a searchable PDF.

Typically, we provide a searchable PDF image for all documents scanned. Document management software like ApplicationXtender (AX) includes a full text OCR capability that populates a database with indexing metadata and adds a pointer so it can be found later.

The only time full-text OCR is needed is when searches beyond metadata need to be conducted, whether its for other types of information like “Seminole County bridge” or if someone is data mining an archive. Otherwise, OCR can add unnecessary cost and may not really be needed.

Still Have Questions?

Give us a call at (800) 956-9000 to learn what you really need to instantly find your electronic documents after they’re scanned. We can also help you build a document manifest and think through the right metadata.

 

Contact us to learn more about ECM implementation best practices