Question 1

We would like to integrate OCR into our document processing. At which points is this suitable?

Accepted Answer

We recommend including OCR in your processes at the following steps:

During document release
During a file conversion
Before check-in to the DMS
– During file conversion of old data stocks to PDF, PDF/A

However, not every file is then additionally processed by OCR. The system itself recognizes whether OCR is useful. Or the OCR process is called specifically only for raster files.

Question 2

We have PDF files with visible text, but the text cannot be searched. What can be done?

Accepted Answer

This can have several reasons. PDF files created by scanning are initially built up only by pixels. A person can read the texts, the computer cannot find them at first. Scanners often already have an integrated OCR. However, these may be inefficient. CAD systems often represent the screen display of texts in the output only by line drawings. This happens when the CAD system does not work with standard fonts. The special fonts for the screen maybe are not available in the screen output. Image parts in the PDF can themselves contain text again, which you want to find and recognized..

Question 3

What advantages do files with searchable text create?

Accepted Answer

Information can be found more quickly in files if the search is not only carried out via keywording in the DMS, but if it is also possible to search directly in the files for relevant terms. To do this, however, the visible text must be searchable. The exchange of data in supplier chains means that documents cannot always be managed via DMS alone. The usability of files is significantly increased if relevant keywords for classifying the files can be taken directly from the file.

Question 4

We would like to convert our legacy data from TIFF to PDF/A. Is this possible?

Accepted Answer

OCR makes sense here, too! PDF/A is increasingly replacing the raster format TIFF as the archive format. Inventory files in TIFF and scanned originals can be converted into PDF format particularly easily. Without additional OCR treatment, however, this conversion brings no added value. The resulting PDF has no useful data other than a raster image. Only the enrichment with text elements offers an additional benefit.

Recognize Text in Scanned Documents

Optical Character Recognition (OCR)

What is OCR?

Who needs OCR?

FAQs on OCR

We would like to integrate OCR into our document processing. At which points is this suitable?

We have PDF files with visible text, but the text cannot be searched. What can be done?

What advantages do files with searchable text create?

We would like to convert our legacy data from TIFF to PDF/A. Is this possible?

Intrigued?

Request further information without obligation!

Conversion of Legacy Data to PDF/A

Media Library

Text Recognition – OCR

Enterprise Conversion Server

Enterprise Conversion Server