Posted by Mark Douglas on 11/13/2014 12:29 PM | Comments (0)

Written by ABBYY

Recognition Server

The University of Southampton Library supports a community of approximately 35,000 students and staff, providing access to an expansive collection in excess of 1.5 million books and many millions of pages of archive material. The Library recently embarked on programme to digitise a large number of its key texts through the Library Digitisation Unit (LDU). The LDU is a flagship enterprise in the academic sector and specialises in the digital capture of a range of materials for repositories or web distribution via URL links in the Library catalogue. The University Library’s approach is to provide open access to the digital material that it creates, wherever this is appropriate and permissible.

ABBYY FineReader, a software product designed for ad-hoc scanning and digitisation, had been used by the Library for a number of years to digitise small numbers of pages and occasionally full books, with the addition of text recognition or OCR (optical character recognition). The Library quickly realised however that in order to automate the OCR process with a high throughput and achieve their goal of digitising half a million pages per year, they would need a more robust product capable of automatically processing large volumes of documents.

Solution

In order to find a solution to process printed documents into searchable formats, such as PDF and PDF/A for digitising archives or records creation, the Library evaluated various options on the market. They examined the following criteria across a number of products: Speed and quality of OCR, range of output formats and compressions and API/workflow integration possibility. After looking at a number of products the Library selected ABBYY Recognition Server as a best-fit solution.

The Library selected Recognition Server because it perfectly matched their requirements – delivery of high quality OCR on printed texts; a broad variety of output options; and an open API for easy integration with other programs. The Library wanted to integrate Recognition Server with the Unit’s Intranda GmbH workflow software, Goobi. The Goobi Production Workflow software is a web application that manages and tracks the Library’s digitisation projects. Additional considerations in ABBYY’s favour were the level of after-sales support and the affordable maintenance.

The LDU currently uses up to six book scanners and one high-end line scanner to digitise texts and images from their collection. ABBYY Recognition Server’s XML ticketing API was used to integrate it with the Goobi Workflow. After the printed materials are scanned, Goobi manages the queuing of jobs to Recognition Server and then monitors the output. Character coordinated output can be ingested into a presentation layer for indexing and access. Thus as soon as the scanner operator completes the digitisation of a book, the files automatically move through each stage of the workflow.

Conclusion

A powerful and accurate OCR is an integral component of the Library Digitisation Unit’s activities. Thanks to the increased throughput offered by ABBYY Recognition Server, Library staff are freed from the tedious work of manual OCR and millions of pages of documents from the University of Southampton’s collection are available online in digital formats to its students and to the wider world.

The successful completion of the digitisation can also be attributed to the product’s manageability and smooth integration with existing Library processes. “The ability to integrate ABBYY Recognition Server into our workflow was critical,” states Julian Ball, Unit Manager at the Library Digitisation Unit. “The installation of ABBYY Recognition Server was quick. Initial feedback from ABBYY and their support team regarding any queries was rapid with good follow-up. We have been very happy with the results and look forward to using the product to continue to achieve our digitisation goals.” 

To learn more about Recognition Server, click here


blog comments powered by Disqus