Skip to Content
Skip to Table of Contents

← Previous Article Next Article →

ATPM 3.10
October 1997

Columns

Segments

How To

Reviews

Download ATPM 3.10

Choose a format:

Review: OmniPage Pro 7

by A. T. Wong, atwong@xpressnet.com

verynice

IMAGE imgs/page1405.gifProduct Information
Published by: Caere Corp.
100 Cooper Court
Los Gatos, CA 95030 USA
Phone: (800) 535-7226
Fax: (408) 395-1994
E-Mail:
<ocr_sales@caere.com>
Web: < http://www.caere.com>
Street Price: $499 US
Competitive Upgrade: $129 US

System Requirements
Macintosh with a 68020 or higher, or Power Macintosh
5MB free RAM(680x0 processor)
8MB free RAM (Power Macintosh)
10MB free hard disk space
MacOS Version 7.0 or later
Screen resolution 640 by 400 or better
Directly supports 13 brands of scanners consisting of 37 models.


What OCR Does
When a scanner captures an image of a text document, it records the page only as a bitmap image—a pattern of dots. Optical Character Recognition will take the bitmap image and convert images of text into editable characters.


In a perfect world, each scan would result in a perfect image that could be OCRed with 100% accuracy. In the real world, the original is often faded, smudged, wrinkled, or otherwise difficult to read. If you (humans are the most efficient OCR device) have trouble reading the text, don't expect an OCR program to do any better.


So what does an OCR program give you? For a moderately small sum of money, you get a program that recognizes almost any printed textual material. With a bit of training, an OCR program can recognize rarely used characters, such as "oe" and "Ø". Law offices use OCR to convert hard copy documents to word processing documents. OCR can convert fax documents to text documents. I have even used OCR software to offer commercial scanning services.


OCR Accuracy
Accuracy refers to how well the OCR program recognizes text. A scanning accuracy of 99% means 99 out of 100 characters will be recognized correctly. Lest you think 99% accuracy is acceptable, consider the following example. Most text will have on average 5 characters per word and 11 words per
line. A 99% OCR accuracy rate results in 1 bad character out of 100 characters, or 1 bad word every 20 words, or 1 bad word every other line. The accuracy drops rapidly when the scanned document is not an original. If you're a good typist, it may be faster to type in a page or two of text rather than OCR at 99% accuracy and manually correct the mistakes.


Caere OmniPage Pro 7.0
As a long-time user of OmniPage Pro 2.12, I felt it was time to upgrade to take full advantage of a PowerMac 8600. Installing OmniPage from CD was very quick and simple. In addition to installing an OmniPage folder, an entry is placed in the Apple menu and a scanner driver is placed in the Extensions folder.


OmniPage Pro is one of the earliest OCR programs offered, so Caere has had time to develop a well-refined user interface that compares favorably to several competitors.With OmniPage Pro's AutoOCR toolbar, OCR is as simple as clicking on a button. The Auto button launches a scan or loads an image, then OmniPage decides which elements are text and which are graphics, performs OCR, and saves the file in a variety of common word processing formats.[!]


IMAGE imgs/page1406.gif

A "noise reduction" feature (actually a standard erase tool), allows you to erase unwanted artifacts before OCR. If the original contains handwritten notes, stray marks, or other types of noise, you'll appreciate this feature. "Smart Windows" will automatically resize the Image or Text windows and show or hide palettes, depending on which window is active. A large screen is required to see full page Image and Text windows simultaneously, but for the average person with a 15" screen, the Smart Windows feature is a very nice compromise. If you scan multi-page documents, the "Thumbnail View" makes it easy to navigate through, reorder, or delete pages from the document.


Any OCR program is incomplete without a spell checker and some means of comparing the recognized text against the scanned image. OmniPage highlights suspect words in green and rejected words in red. "Option-double-clicking" a word in the text view will bring up a floating window showing a closeup image of the original word. In the screen shot below I option-double-clicked the word "INORES" to bring up a closeup of the original image. In this particular case it's easy to see why OmniPage flagged the word "INORES" as suspect.


omni

To better integrate the OCR function into your daily routine, OmniPage provides a Direct Input feature. Suppose you are working on a word processing document and you want to include some text from a magazine article. Activating Direct Input from the Apple menu will cause OmniPage to launch, scan, OCR, paste the text at the cursor location in your word processing document, and quit.


The Automatic Input/Output System and Apple Event support provide some automation capabilities. With the Automatic I/O System you specify an input folder that OmniPage checks every 30 seconds. When it detects a file, OmniPage will process it. Businesses that receive faxes on a regular basis may find it worthwhile to devote a low-cost Mac to automatically OCR incoming faxes. For the techie types, a small but useful set of Apple Events allows for automation of basic OmniPage functions with AppleScript.


Configuring OmniPage
The OmniPage manual includes a chapter on configuring various settings as well as one about customizing more advanced features. For those new to OCR, there is a "Settings Guidelines" section which provides useful recommendations to configure OmniPage for scanning different types of documents. Those 2 chapters are worthwhile reading because the default settings may not always provide acceptable results. Most settings are accessible from the toolbar or the settings menu.


Real World Use
The value of a program is determined primarily by how often you use it. After several weeks of use when hundred of pages passed through my scanner's sheet feeder, OmniPage has earned its place on my hard drive.


Lest you think that the AutoOCR button reduces OCR to a point and click operation, let me say that good results can be obtained only if you have a high-quality original. An original laser print with single-spaced 14 point text is a high-quality original that will result in 99.99% OCR accuracy—1 bad character every 4 to 7 pages.


When documents contain multiple columns, large headlines, and graphics, accuracy will quickly drop to a point where you wonder if it's all worth it. Certain background colors will scan as dark grey or black and completely obliterate all text. Be prepared to experiment with many settings to optimize scanning accuracy. In rare cases, I found that scanning a photocopy of an original improved the accuracy rate. The OmniPage manual provides useful hints to improve OCR accuracy in tough situations. However, nothing beats the experience gained from scanning several thousand pages from books and magazines.


After spending considerable time fine-tuning the settings, I came to the conclusion that user feedback must have weighed heavily in Caere's design of the interface—OmniPage settings are designed with easy access in mind.


To stress the OCR engine, I placed one of OmniPage's sample letters at a 5 degree angle and performed a scan with the default settings. The accuracy was only slightly less than when the letter was perfectly aligned.
As a final test, I scanned several letters using OmniPage Pro 2.12 and 7.0. As expected, OmniPage 7.0 had a slightly better overall accuracy than OmniPage 2.12. What surprised me was that OmniPage Pro 2.12 did as well as OmniPage Pro 7.0 on fair to good quality black and white originals.


Running OmniPage on a PowerMac 8600/200 gave OCR times in the 4-10 sec range compared to 25-60 sec on a 68040 Quadra 800. The flatbed scanner required slightly less time to scan the original when driven by the PowerMac. A PowerMac is definitely recommended.


At university, one of my projects was to develop a very simple OCR program. It was crude and slow, but it recognized some alphanumeric characters. When I visited Caere's home page to check for updates, I was surprised to see a considerable number of highly technical papers reporting on a variety of OCR algorithms. After skimming through several papers I decided they were meant for younger and brighter minds. Definitely worth checking out if you're a Computer Science student.


Conclusion
OmniPage Pro's verification feature makes it easy to locate and correct errors, but even with the program's easy-to-use tools, the process can be time-consuming. OmniPage is a solid performer if you choose the right settings. Failure to do so may result in an unacceptably high rate of errors as well as a certain amount of frustration.


If Caere can improve the accuracy at the default settings I would not hesitate to recommend purchasing the product immediately. If you already have an OCR program you'll have to carefully consider whether or not to wait for the next version to gain (hopefully) improved accuracy.


I evaluated OmniPage Pro on a Quadra 800 running System 7.5.5, and a PowerMac 8600 running System 8. The scanner was an HP IIcx with the document feeder option.


Blue AppleCopyright © 1997 A. T. Wong, <atwong@xpressnet.com>.

Reader Comments (0)

Add A Comment





 E-mail me new comments on this article