Optical Character Recognition (OCR)

Recognising texts in scanned documents

Optical Character Recognition (OCR)

OCR – What is it?

With optical character recognition, it is possible to recognise texts in scanned documents. The OCR process from SEAL Systems works for raster and vector data and can be integrated in automated processes. OCR techniques can make texts of this kind machine-readable. They are then automatically search-capable. Large volumes of files are screened in advance by search engines so that finding them in relation to the total collection of files can be carried out very quickly.

 

OCR

Typical Questions

We recommend integrating OCR at the following points in your processes:

 

  • With the approval of documents
  • During a file conversion
  • Before check-in into the DMS
  • During the conversion of old file data into PDF/A

 

It is not the case that every file is processed via OCR, however. The system itself can recognise whether OCR is appropriate. Or the OCR process is only initialised for raster files on a targeted basis..

There can be several causes for this. PDF files which have been created by scanning are initially only structured with image pixels. People can read the texts but the computer is not able to find them. Scanners often have an integrated OCR. It is possible that these can be poor, however. It is often the case that CAD systems only show the soft copy of texts in the form of poly lines. This occurs if the CAD system does not work with standard fonts. In this case, its special fonts for the display unit are not available with the output. Image components in the PDF can contain texts which you can also find.

Information can be found in files more quickly if the search does not take place via indexing in the DMS, but it is also possible to search for the relevant terms directly in the files. The visible text has to be researchable, however. The data exchange in supplier chains requires that documents cannot always be managed via a single DMS, however. The utility of files is clearly increased if it is possible to capture relevant keywords for the integration of the files directly in the file.

lupeOCR can also help you here! PDF/A is increasingly replacing the raster format TIFF as an archive format. Current TIFF files and scanned templates are especially easy to convert into the PDF format Without additional OCR processing, this conversion does not yield any added value, however. The resulting PDF does not have any further useful data apart from a raster image, however. It is only the enrichment with text elements that brings an additional benefit.

Do you have questions? – Request more information now!

Do you have any questions? We will help you!

Our expert Debra Garls is happy to answer your questions about conversion and publishing solutions.

Debra Garls

Business Development Manager

+1 774-200-0933