OptimiDoc OCR
This section explains in detail the possibilities of OptimiDoc OCR.
- Output format
- Non-searchable PDF
- Searchable PDF
- Multipage TIFF
- Plain text
- Microsoft Word
- Remove blank pages
- No - all pages will be included in the document
- Yes - the OCR engine will be used for blank page removal
- Deskew - automated image deskewing is an essential document imaging function applied to scanned documents requiring compensation for image skew. It does not require leading-edge borders or lines.
- Despeckle - when scanning poorer quality documents, you may get noisy images with lots of dot speckles on them. These speckles, when they appear close to letters or numbers, may affect the quality of OCR. This feature removes such noise.
- Remove punch holes - OCR engine will detect the hole leaving a black dot over places where the punch hole is, and remove it from the final document.
- Remove black borders - if yes, the OCR engine automatically recognizes the black borders of the document and removes them.
- Filename
- OCR language (list of supported languages). It is recommended to select just the language of the scanned document for better results and recognition speed.
- Separation
- None
- Barcode - scanned documents will be separated into multiple documents by barcodes. Barcode represents the first page of the new document.
- Delete separator page - if yes, the page with the barcode will be removed
- Page count
- Page count - specify the number of pages, e.g. 2 means each 3rd page the new document will start; therefore, each output file will have 2 pages
- Blank page - scanned documents will be separated by a blank page.
- Delete separator page - if yes, the blank page will be removed
- Extract barcodes
- Barcode types (list of supported barcode types)
- Barcode regex - use the regular expression to match the barcode with a specific pattern
System parameter [barcode]
One can simply use [barcode] or advanced [barcode:<selector>] where <selector> stands for these options: all, current, next, first, last or the number (index starting from 0).
So for example [barcode:0] is the first barcode we found (same as [barcode:first]) and [barcode:-1] is the last barcode and is the same as [barcode:last].
There is a possibility to specify separating character for the <selector>, for example [barcode:all:separator=-], the default is comma [barcode:all:separator=,].