Homepage » Conversion Solutions » Additional Services » Optical Character Recognition (OCR)
Recognising texts in scanned documents
With optical character recognition, it is possible to recognise texts in scanned documents. The OCR process from SEAL Systems works for raster and vector data and can be integrated in automated processes. OCR techniques can make texts of this kind machine-readable. They are then automatically search-capable. Large volumes of files are screened in advance by search engines so that finding them in relation to the total collection of files can be carried out very quickly.
We recommend integrating OCR at the following points in your processes:
It is not the case that every file is processed via OCR, however. The system itself can recognise whether OCR is appropriate. Or the OCR process is only initialised for raster files on a targeted basis..
There can be several causes for this. PDF files which have been created by scanning are initially only structured with image pixels. People can read the texts but the computer is not able to find them. Scanners often have an integrated OCR. It is possible that these can be poor, however. It is often the case that CAD systems only show the soft copy of texts in the form of poly lines. This occurs if the CAD system does not work with standard fonts. In this case, its special fonts for the display unit are not available with the output. Image components in the PDF can contain texts which you can also find.
Information can be found in files more quickly if the search does not take place via indexing in the DMS, but it is also possible to search for the relevant terms directly in the files. The visible text has to be researchable, however. The data exchange in supplier chains requires that documents cannot always be managed via a single DMS, however. The utility of files is clearly increased if it is possible to capture relevant keywords for the integration of the files directly in the file.
OCR can also help you here! PDF/A is increasingly replacing the raster format TIFF as an archive format. Current TIFF files and scanned templates are especially easy to convert into the PDF format Without additional OCR processing, this conversion does not yield any added value, however. The resulting PDF does not have any further useful data apart from a raster image, however. It is only the enrichment with text elements that brings an additional benefit.
SEAL Systems has been our partner now for many years for our output requirements solutions. Right from the implementation of procedures for printing documents, we saw what potential could result from an output management solution, not just for printer management but also the processes. Based on this experience, we decided on system expansion and today, thanks to SEAL, we have a system by which all core print processes are controlled and optimised. This saves us enormous administrative expenses. We can only recommend SEAL Systems.
Our expert Debra Garls is happy to answer your questions about conversion and publishing solutions.
Business Development Manager firstname.lastname@example.org