For the first time in over 4 years, I had a customer that actually needed to use OCR to convert some scanned images into text (to be inserted into a different document)… but without needing to re-type the scanned images for many hours.
The first thing I looked for was a scanner.
I knew that if the customer had a scanner, then the scanner software invariably contains an OCR package.
She had a Dell multi-function printer/scanner… good.
since the scanned images were embedded into some PDF documents, I first had to copy/paste the images from the pdf document into an image viewer (like IrfanView), so that I could save a JPEG file… I could then get the OCR software to “interpret” the image.
It turns out the OCR software was Abbyy fine reader.
And it seemed to do a great job except for one vital flaw: many words would skipped.
It didn’t take long to see what was happening: only basic words like “the” and “had” and “from” were being interpreted… and other, more complex words, were being deliberately blanked out.
I just didn’t have time to look into Abbyy more closely, but I suspect that it was just a “demo” product, as the help had some mention of purchasing the professional version…
Since the customer didn’t want to learn how to save images from a pdf document, and then learn for to use Abbyy, she decided to find another workaround.
But I found 2 annoying aspects to all this:
- Abbyy will not interpret “scanned images” embedded within PDF documents.
- Why bother including hobbled software (like Abbyy) with a scanner… it might result in an extra sale, but its twice as likely to result in an unhappy customer…