找回密码
 注册
搜索
热搜: 超星 读书 找书
查看: 394|回复: 1

[【推荐】] 文档扫描概述

[复制链接]
发表于 2010-2-17 09:11:10 | 显示全部楼层 |阅读模式
An overview of Document Scanning
How to convert paper into electronic documents
Provides a guide of document scanning, covering everything from the principles of scanning paper, how to scan, types of document scanners, quality options, compression options and the best way of scanning
1            Introduction
This document provides guidelines processes for document imaging which will be compliant with BIP0008, and ensures a successful deployment of a document scanning solution. It covers the following areas:
·    Document preparation for scanning
·    Batching of documents
·    Scanning process
·    Sample set (documents used for calibration)
·    Quality control
·    Re-scanning
·    Image processing
·    Scanner recommendations
·    Design of documents for optimal scanning
1.1             Back File and Forward Scanning
Within this document, reference is made to the terms “Back File Scanning” and “Forward”/”Day Forward” scanning. Back file scanning is the term to describe the scanning of historical paper records (such as files), converting them into an electronic medium to allow the removal of the paper thus freeing up storage and paper transfer costs. Forward or Day Forward scanning refers to the on-going process of scanning new paper as it is received.
A reference is also made to “On Demand” scanning. This is a method of converting historical files to electronic media, as required (as demanded). Typically, this occurs where historical large files on a subject (company, person, patient, project) exist, but back file scanning is not desired. On Demand scanning means that where documents are required, when paper files still exist, they are converted to electronic format as they are initially requested, thus gradually removing the historical or paper files.
As back file scanning quickly frees up large amounts of paper storage space, back-file scanning is normally the preferred method, then moving onto forward scanning for new paper. Back file scanning can be performed either in the companies offices, or can be taken off-site and quickly scanned using industrial scanners as a service.
1.2             Scope and Objectives
When a Company takes on a scanning solution for day forward and back file scanning, it is useful to first understand the best practices for scanning paper. This document highlights the recommended process for document preparation and scanning, and provides a recommendation of the types of scanners that will be required to implement the solution.
1.3             Background
This document has been created to answer the questions a company will have around the area of the scanning process. The questions answered here will include information based on the Legal Admissibility of documents as defined in BIP0008 (to provide copies of documents as evidence in court), and what techniques are recommended for successful scanning and document management.



2            Document scanning processes
2.1             General
This section includes recommendations relating to the procedures relevant to document image capture. These recommendations cover procedures for:
·    preparation of documents
·    document batching
·    photocopying (to improve scanning success)
·    scanning
·    image processing
2.2             Preparation of paper documents
All paper documents need to be examined prior to the scanning process, to ensure that a successful image is obtained. Attributes such as paper size, weight, physical state (thin paper, creased, stapled, etc.), binding, and print colour, black-and-white, colour, tonal range, etc. can all affect the physical scanning process.
Where documents are found which are unlikely to be accepted by the scanner, there are a number of techniques that can be used. For example the original could be photocopied or transparent wallets could be used.
·    When removing staples, clips, or other document bindings, ensure that no damage is caused to the original that may affect the capture of the information from the document.
·    Where a source document has physical attachments, for example, stick-on notes, they must be distinguishing from the document to which they are attached and linked to the original document after scanning so that both can be viewed.
This should be achieved, by capturing a separate image of the attachment on the original page. The index data should record the fact that there is an attachment and a link to the original page. It is suggested that the document preparer photocopies the original page to scan as the first page of the attachment with the copy and the attached note added as the second and successive pages where there are multiple notes.
·    Where a source document has physical amendments, for example, white correction fluid, the workflow introduced should ensure that the presence of such amendments is noted. This should be through the use of a black ink pen to circle the amendment.
·    All pages of multi-page documents should be kept together and in the appropriate order before, during, and after scanning.
·    All pages which require specialised scanning, e.g. forms, oversize pages, low contrast pages etc should be extracted for scanning in specialised scanners or with different scanning settings (colour, contrast, resolution, etc).
2.3             Document batching
Generally, documents can be scanned in two methods. Batch scanning, where documents are sorted into batches by type and subject, or intelligent scanning where OCR is performed on the scanned pages and text and/or markers are used to identify the storage context.
Batch scanning of documents generally requires the production of cover sheets to be inserted between each batch of pages. These cover sheets carry identification marks (normally bar codes) which indicate the type of document and context (company, patient, person, etc). The processing of batched pages within the scanning/reading process is very fast, but requires the pre-production of cover pages.
The intelligent recognition of pages removes the need to identify the context prior to scanning, but requires a table to be defined within Metamation with the storage context. The validation and identification is carried out in the background by the scanning reader engine (which performs the OCR on the documents), which means a slower background throughput, but less document preparation time.
Both batch and intelligent recognition can be used together for optimum performance (such as using intelligent recognition for document storage, but pre-printed cover pages for paper ‘forwarding’ to individuals within the organisation).
The choice of the preferred method of batch/scanning control will depend on the types of documents to be scanned.
2.4             Photocopying
It may be helpful for some documents to be photocopied prior to being scanned. Such documents include:
·    documents that may be adversely affected by the scanning process, such as damaged or delicate documents
·    documents where there are substantial contrast or density variations over the area of the original, and where photocopying demonstrably improves the image quality
·    documents containing paper or ink colours that do not produce legible scanned images, and where photocopying demonstrably improves the image quality
·    photocopiers and scanners may respond differently to different colours, and it is only in exceptional cases that the technique of photocopying prior to scanning does not produce satisfactory results
·    Photocopies should be examined to ensure that there is no significant loss of information during this process.
Where an image was made from a photocopy, it should be stamped as a ‘photocopy’ or ‘original photocopy’, and indexed as having been captured from a photocopy, distinguishing between photocopies made during document preparation and source documents which are known photocopies.
2.5             Scanning processes
As a general rule of thumb, it is recommended that all scanning should be duplex, in colour, and at 400dpi resolution. However, colour high resolution images take up more storage space, so as detailed within this document, different types of pages may require scanning at lower resolutions, in grey scale etc.
As part of the set-up and configuration of a scanning process, the types of pages to be scanned can be checked and scanning ‘jobs’ can be created to scan with different settings. The scanning process would then need to factor in the separation of different document types based on the types, sizes, contrast and usability of the paper concerned.
To ensure that all documents in a batch are fully scanned a count of captured documents should be compared with the number of documents in a batch.
2.6             Quality control
Procedures are required which reduce the risk of scanned images being of unsatisfactory quality. The evidential weight of scanned images will be increased if it can be demonstrated that the images are of good quality, and that the scanner was working to agreed standards at the time of scanning.
·    A sample set of source documents should be assembled for the purposes of evaluating scanner results against agreed quality control criteria and should consist of a representative type of documents to be scanned, and should consist of a duplex (front and back) content page.
·    Documents in the sample set should be representative of the complete set of documents that is to be scanned.
·    Documents in the sample set should include examples of source documents whose quality is poor relative to those of the majority of the documents.
Quality control criteria should cover;
·    overall legibility
·    smallest detail legibly captured (e.g. smallest type size for text; clarity of punctuation marks, including decimal points)
·    completeness of detail (e.g. acceptability of broken characters, missing segments of lines)
·    dimensional accuracy compared with the original, scanner-generated speckle (i.e. speckle not present on the original)
·    completeness of overall image area (i.e. missing information at the edges of the image area)
·    density of solid ‘black’ areas, and colour fidelity
Quality control criteria for image quality should be realistic given the nature of the source material and the characteristics of the scanning equipment and based upon the sample set of documents.
2.7             Evaluating image quality
The scanners should be setup using the sample set of documents and should be retested on a weekly schedule to ensure the best quality of the scanned images.
During operation evaluating image quality should be undertaken using a 20” monitor with a resolution of 90-100 dpi monitor which should allow the validation operator to view the document as a complete page to ensure that the comparison with the original is complete. The validation system should allow the operator to print suspect documents to verify that the image can be reproduced and validated against the original where the reproduced image is as good as the original.
The scanned image should be printed on a colour printer with a greater resolution than the 400dpi scanner. This is to ensure that all information is printed.
The results of all quality control checks should be stored in the Quality Control Log with the reason for rejection.
The sample rate should be every 5th page in the first month reducing to every 10th page in the second month and in the third month it should reduce to one page every 1 hour.
When the sample set is used the whole set should be validated for accuracy.
2.8             Checking scanner performance
Optical and paper transfer rollers should be cleaned daily or on demand, when for example, a clean original shows banding on the scanned image which is produced by dirt on the optical system.
The sample document should become the scanner test target and should be used to monitor scanner performance.
Scanner performance checks should be used weekly to ensure that the scanner performance is within agreed tolerances.
Hard copy prints should be made of the scanned images of the test targets and compared with the test targets themselves to determine whether the quality criteria are met.
2.9             Rescanning
All pages marked for rescanning should be identified and rescanned using a flatbed scanner where possible, to improve operator controls over the scanning.
The operator should have the ability to change the contrast adjustments or increase the resolution of the scanner to improve the scanned image. However, de-speckling of the image should not be allowed as this can change content of the scanned page making the original un-reproducible.
All pages rescanned should replace the original scanned page which was marked for rescanning. The operator should ensure that the information on the page is accurately represented before replacing the image.
2.10           Image processing
The following sections (5 and 6) describes some different types of documents, and associated image processing facilities that may be used. Some of these operations are carried out during and/or after scanning.
The scanners should be setup to automatically de-skew pages. On occasions the operator may need to carry out the de-skew operation using the pull down menus in the application. This should be at the operator's discretion and the alternative is to reject the page and rescan the page.
Where documents are OCR'd or OMR'd, then the operator should be required to verify the accuracy of the text or marks against the original page as well as the scanned page. This is to ensure that the accuracy of the content is represented when carrying out free text searches.
De-speckling and border removal is NOT acceptable and if the page requires extra processing to remove noise from the page then the page should be rejected and rescanned with a different scanner setting and the page carefully validated for quality.




3            Image processing
3.1             General
For legal re-production of the original scanned documents, the majority of image processing tools cannot be used, e.g. De-speckling. The following are the acceptable tools that can be used.
3.2             Document skew
Document skew is a term used to describe the phenomenon of poor document alignment (rotation) during the scanning processes. In its most pronounced form, images can appear on a viewing screen as crooked or slanted. Even a small angle of skew is likely to affect data capture processes and thus reduce data recognition rates.
Passing images through de-skewing processes may correct this problem.
3.3             Speckle, noise and background marks
There features should not to be used. It is included for information only.
Random black marks (speckles) which appear on an image may have been generated during the scanning process, or may be present on the original document. These speckles may be removed by systems involving special algorithms. These algorithms assume that small isolated clusters of pixels contain no information, and may be deleted.
3.4             Black border removal
This feature should not to be used. It is included for information only.
When scanning documents of mixed sizes using certain scanner types (such as rotary scanners), black borders may be left around the edges of smaller documents. Black border removal entails the deletion of such large areas of black pixels.
3.5             Forms removal
The scanning of textual information on a pre-printed form is common when automated data capture processes such as OCR and OMR replace a large keyboarding operation. To increase the accuracy of the recognition rate, images can be passed through a post-scanning process that will remove boxes, lines, and pre-printed text.
Where new forms have been designed and are intended for OMR and OCR then forms removal should be used. The forms will be defined during implementation and a list should be given to the operators and the scanning systems setup to recognise the forms which are enabled for this method.




4            Scanning specific types of document
4.1             General
This section gives details of different types of documents, and the scanner characteristics needed to give acceptable results within the Metamation information management system. The characteristics detailed in this section are not applicable where Optical Character Recognition is to be performed on the scanned image.
4.2             Text, typed and printed
It is recommended that a resolution of 400 dpi be used as the minimum for the following reasons:
·    At lower resolutions, some detail may be missing from some characters, particularly if they contain thin elements, including serifs; fonts under about 6 point on the original as they may not be captured very clearly.
·    With material containing particularly small type sizes (e.g. superscripts and subscripts), a resolution of 600 dpi or more may be necessary.
·    For material that may be processed using Optical (or ‘Intelligent’) Character Recognition, it may be beneficial to scan at a higher resolution than would be satisfactory for visual legibility. For example, while for much material 200 dpi would be satisfactory for visual representation, it may be preferable to use 300 dpi resolution if OCR/ICR is to be used; similarly, where 300 dpi may be visually satisfactory, 400 dpi may be better for OCR.
·    Material which contains handwriting is known to be difficult to read a resolution of greater than 300 dpi may be required.
No decisions should be made regarding choice or resolution without conducting tests against the sample set. Careful tests should be carried out to ensure that the resulting image remains an effectively ‘true’ facsimile of the original. These tests should use the sample set of documents, and hard copies should be made of scanned images.
There should be no anomalies introduced into the enhanced image that are visible under normal office lighting conditions.
It is important to bear in mind that the validation monitor should have an effective resolution of about 90 to 100 dpi. This is normally adequate for typed material but ‘zooming’ may be required with small sized print, and this requires that the scanning resolution should be substantially greater than the basic display resolution.
The results of these tests should be stored with other records of the scanning processes.
4.3             Line drawings/art
For line drawings/art which form part of otherwise text-oriented documents, the scanning resolutions applicable to text are typically satisfactory for the drawings also. With printed material, where fine lines are used in the artwork, 300 dpi may be too low, but this can only be determined via tests on sample documents.
4.4             Handwritten material
With material where a modem pen, ball-point, or pencil was used, 400 dpi will normally be adequate. For older material where a steel-nibbed fountain pen was used, the thinness of the upstrokes will often require 400 dpi as the minimum resolution which will satisfactorily capture the text without significant components of these upstrokes being lost.
Handwriting (or hand drawing) using pencils can be faint, and difficult to reproduce. Care should be taken to ensure that image brightness and contrast are appropriate for these images.
4.5             Charts, plans, and drawings
For hand-drawn charts, architectural, and engineering drawings, there may be finer lines present than would be the case with a typical ‘full-sized’ CAD drawing, and although 300 dpi will usually be a satisfactory resolution, tests should be done to ensure that the finest detail is captured. It may prove necessary to use 400 dpi.
If the scanning is to be done from copies of the originals, and if these copies have been reduced from the originals (which is quite common), then a higher resolution may be required than would otherwise have been satisfactory.
With drawings and Critical Care Unit (CCU) charts, dimensional accuracy may be important. Because of the large size of drawings, the paper or film may undergo dimensional change (due mainly to variations in moisture content). For working drawings it is often a requirement when scanning that dimensional inaccuracies are corrected, i.e. the scanned image may be post-processed to correct scale inaccuracies, skew or lack of orthogonality. Such corrections mean that the subsequent image is not a true facsimile of the original. Where legal admissibility may become an issue, it will be required to preserve an uncorrected version of the scanned image as well as the corrected version with the appropriate links to both to both documents.
4.6             Maps
With maps, a minimum resolution of 400 dpi will be required, but much higher resolutions (e.g. up to 1000 dpi) may be required with some material which contains fine detail.
As with drawings, scanned images of maps are frequently corrected for scale inaccuracies and lack of orthogonally in the original after scanning.
Where coloured maps are being scanned, and the colour is to be preserved, the scanner should be capable of capturing individual colours with the required discrimination. While the number of colours subjectively present may be quite small, 8-bit colour (256 colours) may be inadequate and it may be necessary to scan with 24-bit colour in order to provide the required colour discrimination. Tests should be done to determine how many ‘bits’ of colour are required.
4.7             Half-tone material
Where half-tone material (black-and-white or colour separated) is present on a page along with text and/or line art, the outcome objectives of the scanning should be considered.
If the objective is to produce a scanned image that is comparable in quality to a ‘normal’ black-and-white photocopy, then a scanner which produces a digital image (i.e. ‘black-and-white’) will suffice. The resolution may have to be higher than that which would be acceptable for text only: 400 dpi will be required to capture halftone material.
If the half-tone content has value in the application context, following the recommendations that apply to scanning text or line art may result in the capture of images of unacceptable quality from the half-tones.
Most scanners have different settings for scanning text or line art and scanning half-tones. It is a general problem when scanning mixed text or line art and half-tones with a ‘black-and-white’ scanner that the scanner settings that are optimal for text are far from optimal for the half-tones, and vice versa. When set for ‘text’, the quality of the half-tone images will generally be significantly worse than a photocopy; when set for ‘half-tone’ or ‘photographs’, the text may appear rather blurred in the scanned image, to the extent that the image would not form a good facsimile of the original text.
If the half-tone content has ‘cosmetic’ value only and does not contribute to the essential information content of the original, then the scanning should be done according to the recommendations which apply to text or line art material.
If the half-tone is to be captured to a quality level comparable to that of a typical (good quality) photocopy, then there are two options. One option is to scan the document with the scanner settings ‘normal’, at a higher resolution than would be necessary for the text alone; 400 dpi minimum is recommended. The other is to scan the document twice, to create two images, one where the text/line art is captured to satisfactory quality and the other where the half-tone material is satisfactorily captured. In the latter case a record should be kept that the production of the two images involved different scanner settings (affecting the processing performed on the images).
If the half-tone material is to be produced to a quality comparable to that of the original, then it should be processed according to the recommendations for photographs.
4.8             Continuous-tone images
Continuous-tone images include photographs, medical and industrial radiographs (X-rays), and images generated by computer as photographic style images, including, for example, ultrasound images, CT and MR images.
With material containing continuous- tone areas (grey scale or colour), where the tonal information should be preserved, scanning should be performed with a scanner capable of capturing the required number of grey levels and/or colour. The number of levels that is appropriate should be determined by benchmark tests on the sample set of documents.
For images from photographic material, the number of grey levels will typically be 16, 64, or 256 (i.e. 4, 6, or 8 bits per pixel). For very high quality images, 256 levels are normally used, and for X-rays, up to 1024 levels of grey (i.e. 10 bits per pixel) may be necessary.
For colour photographs, 24 bit per pixel of colour information is used in most applications, but for very high quality images, up to 36 bits per pixel may be necessary. Typically, 15 or 16 bits of colour are used; for source material containing only a small palette of colours, 256 grey levels may suffice. Tests should be performed to determine how many colour levels are required.
·    With continuous-tone colour, most scanners capture 8 bits of colour information in three different regions of the colour spectrum: Red, Green, Blue (‘RGB’), resulting in 24 bits per pixel, or the ability to reproduce over 16 million colour variations.
·    With only 8 bits of colour information (256 levels), there may be a noticeable ‘blockiness’ in the image if the original contains a broad range of colours.
Scanning resolution requirements for documents containing colour are normally similar to that for black-and-white material, particularly if there is text present on the original. Thus scanning may be performed at 200-400 dpi, referred to the original photograph. If there is no text present on the original satisfactory images may be achieved at lower resolutions, down to television quality levels (about 350 lines per image frame); this would typically be satisfactory for identity photographs and similar applications.
To assess image quality, in general it is satisfactory to compare the screen images with the original. If there is likely to be use of high quality hard copy images then the comparison should be made between hard copies of the images, produced on a high quality colour printer, and the originals.
Care should be taken when comparing screen colours with an original that the colours were correctly balanced at the time of image capture, and that the display system has also been calibrated correctly. Otherwise the displayed colours may be significantly different from the colours on the original. The same requirement applies when comparing the original with hard copies of the captured image.
Where colour accuracy is important, a standard Colour Gamut test chart should be scanned at the same time as the original (or batch of originals scanned at the same time), and the image of this chart stored along with the original.
4.9             Mixed mode documents
Mixed mode documents comprise more than one document type inside a single document (e.g. photograph, text). From a scanning perspective the documents described above containing half-tone material are essentially of this type, even though the original has been created in a single print operation. As described in 6.7, the use of scanner settings optimized for one type of material can result in the loss of information in material of other types. As suggested in 6.7, one solution is to capture multiple images, with scanner settings (or even scanner type) selected to optimize the image quality for each material type.
One option is to use a scanning system that can scan mixed mode documents automatically, with automatic detection of each type of material and automatic optimization of the settings for each type. These systems can also be set to select the most appropriate compression algorithm for each type of material. Benchmark testing should be done to ensure that the results are acceptable.
4.10           Documents with note sheets attached
Some documents may have note sheets or notelets attached. Care should be taken when scanning such documents. It may necessary to remove the attachment where, for example, it obscures information on the document. If removal is required, the note should be marked or stamped as being a part or page of the document to which it was attached, and scanned and indexed separately. The original page should also be indexed to indicate that it has an attachment.
Where a system has a facility to indicate that a document has a related image, then this facility should be used.
4.11           Microform documents
Microforms should be examined carefully prior to deciding upon the scanning approach. Within multi-frame microfilm media (roll film, microfiche, microfiche jackets, multi-frame aperture cards); unless the inter-frame gap can be detected unambiguously automated frame detection should not be used.
If the gap is not detected multiple frames may be merged into one image. Depending on the physical characteristics of the scanning system it is possible that some part(s) of the digitized image may be lost.
With jacketed film, film strips may overlap. The processing procedures should ensure that such overlaps may be detected and corrected before scanning, otherwise some page images will be missing or illegible, in whole or in part.
Where a rotary camera has been used, images on the film may not have a one-to-one correspondence with the original documents. For example, two pages may have been fed at once, so that on the film part or all of an original page may be missing.

http://knol.google.com/k/an-overview-of-document-scanning#
扫描的全面解说。
回复

使用道具 举报

 楼主| 发表于 2010-2-17 09:11:40 | 显示全部楼层
5            Design of documents for optimal scanning
The following guidance has been proposed to assist companies who current use forms for capturing information, in the redesign of forms and other documents to make them as useful for future scanning as possible. It is suggested that where forms are already in use, where possible, the design of forms that are due for reprinting be considered for optimising it for scanning.
5.1              Machine-readable metadata on forms
·     There is no technical constraint on which fields are used for barcodes or other machine-readable features. However, the use of too many will require either large codes or will result in a raised failure rate in recognition.
·     If a stick-on barcode label is to be used, it is recommended that only the context identifier be encoded in the barcode – assumed to be printed out from the system. If multiple codes are provided to a user, it is probable that a user will select the wrong label so causing the document to be incorrectly indexed.
·     If such a label is not used, it is recommended that specialty, document type and context identifier all be encoded on the stationery at the time that it is printed. This may be through a combination of machine reading printed text, mark recognition, and barcode reading. The company should apply a house style with this information being consistently applied for human and machine use.
5.2              Barcodes
·     Each organisation should use a single barcode protocol. The standard bar codes which should be used are CODE39 or CODE128.
·     The barcode should be printed clearly at high resolution on a laser printer on good quality labels with clear white space around the code to give the maximum probability of successful recognition.
5.3              Layout
·     During capture, the system can be instructed to look anywhere on the image for a barcode.
·     No critical information should be placed within 5mm of the edge of the sheet due to the risk of folding and information loss.
5.4              Positioning of barcode labels
·     The systems are tolerant of label positioning and orientation within the area that they are instructed to search. For speed of assimilation and checking whether the label has been attached, it is recommended that a consistent location is used on forms, such as the top right corner of the form.
·     The essential requirement is that the barcode label should be placed within the area searched. As placement will be by hand and often under stress, the tolerance for the label within the target requested area should be at least +/- 10mm for successful recognition. This allows for the label to be either out of position or tilted (or both).
·     Where other marks are used (e.g. to identify the document type) these should be placed +/- 1mm.
·     Bar code tilt (off the horizontal line) is permissible with a general tolerance of between -20 and + 20 degrees. Beyond this, attached bar code labels may not be recognised.
5.5              Colours, paper etc
·     The best scanning performance occurs with the use of plain white A4 paper at a weight of 80gsm (standard office paper).
·     The use of coloured stock can be useful in providing visual clues to users so reducing risk and aiding identification, as long as the use of the correct colour can be assured. The tint should be slight (pastels) to maintain legibility in the paper and scanned forms; dark background and shading should be avoided.
·     Black ink is preferred.
·     Where forms need to span more than one sheet or to be folded, consideration needs to be given to maintaining them as a unit that does not fall apart during use, but is easily separated for scanning. Glue that remains tacky after separation contributes to a high rate of jams in scanning and must not be used.


6            Scanner specifications
This section provides a brief introduction to the sizes and types of scanners that a company may use as part of introducing a scanning solution.  A review of the types of paper to be scanned and volumes would identify the preferred or recommended scanners as part of a full investigation.
Typical volume scanners recommended include:
·     Bell & Howell Spectrum Series
·     Kodak i-Series
·     Canon DR series scanners
6.1              Recommended Scanners
There are three levels of scanning that need to be considered:
·     Low volume scanners for the purpose of desk top scanning specific individual pages
·     Medium volume scanners for the purpose of day forward or on demand scanning of daily received paperwork
·     High volume scanning for the purpose of back file scanning large volumes, where this is to be performed by company
The following information is reproduced from the manufacturers’ publications. For definitive current information, the reader is directed to the manufacturers’ web sites.
6.1.1            Recommended Scanner for small scale users: Kodak i60 Scanner
The Kodak i160 has a superb feeder & image enhancement capability, and is rated at 1,000 pages per day. It has the ability to scan A3 documents also, should this be required, and a combination of this, and the feeder/image enhancement, will allow most other document types to be scanned. This is fitted with 'ultrasonic multi-feed detection', which accurately tells the operator when more than one page feeds at a time, irrespective of paper length or thickness. It should stop the scanner on a multi-feed if required.
The i60 scans in colour, greyscale, and black and white at up to 25 pages a minute. Resolution can be set in the range 75 to 600 dots per inch.
This duplex model saves time by simultaneously capturing both sides of your documents at up to 50 images per minute.
The shorter the paper path, the less likely that documents will jam. With the proven paper path design, you can count on thousands of hours of productive scanning.
Thanks to the document feeder, you can finally use "reliable" and "automatic" in the same sentence. Load up to 75 sheets at a time, in a wide range of sizes and weights. A built-in flatbed handles your delicate onionskin and cumbersome bound pages.
The rated duty cycle is up to 1000 pages a day.
An illustration of the i60 Scanner is shown below:
Figure 12 - Kodak i60 Scanner
6.1.2            Recommended Scanner for Medium sized users: Kodak i280 Scanner
The following features described the faster i280 Scanner that would be suitable for medium sized scanning situations:
Exceptional speed, image quality, and flexibility. Fast scanning, automatic image processing and five output options mean you can scan up to 248 images per minute at 200 dpi. Choose from colour, bitonal, greyscale, simultaneous bitonal and greyscale or simultaneous bitonal and colour output.
Perfect Page Scanning with iThresholding delivers clean, sharp images at full speed, dramatically reducing pre-sorting, re-scans, and post-image processing.
Exclusive SurePath paper handling is designed for smooth paper transport and virtually jam-free operation.
Electronic Colour Dropout allows you to optimize forms processing by removing irrelevant red, green, or blue background colour.
Innovative options to meet more scanning needs:
Dockable flatbed handles exception documents and offers space-saving flexibility (not included in indicative price).
Post-scan imprinter lets you track documents after scanning (not included in indicative price).

Figure 13 - Kodak i280 Scanner
6.1.3            Recommended Scanner for Larger scanning situations: Bell & Howell 8080DB1-CE
The following features described the heavy duty, high volume Bell & Howell 8080DB1-CE scanner that would be suitable for a more demanding situation:
The indicative model is 8080DBI-CE, A4/A3, 65ppm, 100 to 400 dpi, SCSI i/f only, with Imprinter. Others are available within the range.
The new features on B鰓e Bell + Howell’s Copiscan 8000 Spectrum make production scanning a cost-effective option, even for scanning in colour.
Here’s how: with Spectrum’s superior paper handling and image enhancement with VirtualReScan™ (VRS) from Kofax, documents can now be scanned the way they come to you—mixed together in every shape, size, and paper type.
And now with Auto Colour Detect, colour documents scan in colour, bitonal in bitonal, all simultaneously on one system. If you scan in colour today, this can reduce the time needed for scanning by up to 60% over other methods. Even file sizes for your largest colour images, those with colour backgrounds, can be reduced by up to 40% with the Colour Background Saturation and Dropout feature. Imagine scanning in colour when you want it, bitonal when you don’t, all at maximum speed, minimal cost, and the quality to handle between 6,000 and 60,000 documents per day. Now you can.
All Spectrum models handle up to 500 sheets of paper at a time, from 2.60" x 2.60" (66 mm x 66 mm) up to 11.70" x 40" (297 mm x 1016 mm), with optical resolution ranging from 100 to 400 dpi (dots per inch).

Figure 14 - B & H 8080DB-1 Scanner



7            Compression Techniques and Recommendations
7.1              Introduction to Compression
When pages are scanned in, the format for the storage of the pages will directly affect the amount of storage space required and reproduction quality available. As a guide to compression and storage options, a test has been carried out on a variety of scans of sample documents to understand impact of colour scanning on storage requirements and the impact of compression schemes on legibility of different content types.
7.2              Background
·     Date of Tests: February 9th 2006
·     Scanner: Bell & Howell Spectrum 8080
·     Scanning Software: Kofax Ascent Capture 7.0
·     Internal Processing Format RAW (Uncompressed) TIFF (Batch Properties)
·     JPEG Compression quality compression set to 100 for Scan Sources (KSM Panel)
·     Group IV & JPEG Compressed TIFFs generated directly from scan software
·     LZW & PDF outputs generated from RAW TIFF 24-Bit Colour Scans (post-processed)
·     All Colour Scanned at 24-Bit / 16.8 Million Colours
7.3              Samples
Five sample documents were chosen to represent a range of content types. These are displayed below with brief explanations for the choice made.
The 1[sup]st[/sup] document chosen is a health diagnostics form, but in fact could be any blank A4 sized form which needs to be manually filled out and stored. This has been completed by pen, and has been selected as an example of a detailed document chart with (arguably) cosmetic colour. The NHS logo (blue), warning message (red), and different pen inks used don’t necessarily convey any additional information. If storage cost becomes an issue and consideration is given to scanning more content as black and white, then this type of content is a candidate for black and white scanning. This image has a short code of “Cos” (Cosmetic) in the final results table
The second document chosen is an A4 computer form, printed on a piece of colour paper, and is selected as it is an example of a piece of content that is essentially black and white with a colour background. A variety of forms with different coloured backgrounds exist and this sample demonstrates the potential impact of scanning a colour document in black and white if the colour elements are not deemed meaningful or significant. This image has a short code of “BGC” in the final results table
The third document selected is the same as item 2, but where the page has been commented on using a variety of coloured pens. This document was chosen to represent a situation where a black and white document has been annotated with coloured ink and that ink is considered to be significant in some way. This ‘extreme’ sample was created for the purposes of this exercise on a blank form. In addition to Blue, Red and Green a yellow highlighter was used on various areas of the form to test the ability to scan this very light low contrast colouring. This image has a short code of “Ano” (Annotation) in the final results table.
The fourth sample document is an A4 full colour page. This image is taken from a magazine and was chosen as it shows a mixture of rich colour photographic detail with fine colour gradients alongside text. This combination of content represents a more significant challenge for a compression scheme. This image is included to represent the photographic images often found within brochures and other printed literature. This image has a short code of “Mag” (Magazine) in the final results table
The fifth and final document selected is a health ECG chart (detailed graph of a patients brain activity). This document was chosen to represent the fine detail of a clinical chart. This is an ECG with a red chart scale and black ink recording the values on the chart. This document will particularly test the ability of the compression schemes to retain fine detail while reducing content to small file sizes. This image has a short code of “ECG” in the final results table


7.4              Notes
·     In the table below some test combinations were not performed as they would not produce useful information or were simply inappropriate to the content type
·     Scans were performed at 200, 300 and 400dpi for comparative purposes
·     Black & White scans of some colour images also performed for comparative purposes to illuminate cost delta discussions between colour and black and white scanning
·     DjVu images are provided for comparative purposes only and do not conform to government e-GIF standards for image file formats. The DjVu viewer will be required to open these sample images
7.5              Results
The table below shows the file sizes of the scanned output combining the sample documents detailed above with various file formats and compression schemes.
Output Format
Compression Scheme
Resolution
Colour B&W
Office Doc Cosmetic Colour (Cos)
Office Doc Colour Annotation (Ano)
Office Doc Coloured Background (BGC)
Clinical Doc Fine Detail (ECG)
Colour Photograph (Mag)
Average Size
TIFF
None (RAW)
200dpi
B&W
474K
471K
474K




473K
TIFF
None (RAW)
300dpi
B&W
1064K
1058K
1066K


1040K
1057K
TIFF
None (RAW)
400dpi
B&W
1901K
1880K
1894K




1888K
TIFF
Group IV
200dpi
B&W
68K
101K
34K




68K
TIFF
Group IV
300dpi
B&W
210K
176K
52K


152K
148K
TIFF
Group IV
400dpi
B&W
398K
255K
70K




241K
TIFF
None (RAW)
200dpi
Colour
11241K
11201K
11280K
10623K
11009K
11071K
TIFF
None (RAW)
300dpi
Colour
24502K
25286K
25460K
23949K
24821K
24804K
TIFF
None (RAW)
400dpi
Colour
45179K
44946K
45257K
42617K
44170K
44434K
TIFF
JPEG 100%
200dpi
Colour
634K
547K
529K
1063K
604K
675K
TIFF
JPEG 100%
300dpi
Colour
1326K
1053K
1040K
2188K
1157K
1353K
TIFF
JPEG 100%
400dpi
Colour
1990K
1576K
1649K
3343K
1901K
2092K
TIFF
LZW
200dpi
Colour
4532K
3322K
4653K
6603K
1943K
4211K
TIFF
LZW
300dpi
Colour
8725K
6427K
9946K
18401K
10469K
10794K
TIFF
LZW
400dpi
Colour
14252K
10142K
17012K
30166K
17706K
17856K
PDF
LuraTech
300dpi
Colour
88K
128K
73K
282K
101K
134K
PDF
LuraTech
400dpi
Colour



408K


408K
DjVu
DjVu
300dpi
Colour
72K
96K
52K
254K
65K
108K
DjVu
DjVu
400dpi
Colour



396K


396K


7.6              Findings
·     The type of content scanned makes almost zero difference to the size of the uncompressed scan files (colour and black & white) with 200, 300 and 400 dpi colour scans requiring 11, 24 and 44MB per image respectively
·     LZW compression resulted in an average 57% reduction in file size – good but not excellent – this still means a 300dpi colour A4 scan requires 10MB on average. This is a reflection of the age of this compression scheme and it’s generality – the other schemes tested here are specifically designed to compress images
·     JPEG compression resulted in an average 95% reduction in file size – excellent but still resulting in a significant file size increase versus a black and white Group IV scan – the JPEG compressed Colour TIFF on average requiring 1353K versus 148K for black and white. Clearly more information is stored, but this is a size (and therefore storage cost) increase of near 1000% (or 10 times the size).
·     The LuraTech PDF compressor achieved even more significant file size reduction averaging 99.5% compression and resulting in a file sizes comparable to a traditional Group IV black and white TIFF scan. The implication is that this level of compression requires no cost of storage premium for scanning in colour – though of course the cost of the compression software must be factored in. For this scheme PDF must be used as the container format – this scheme utilises a variety of recent compression techniques (JBIG2, Wavelet etc.) which are valid with PDF (and produce perfectly standard PDF output) but are not part of the (now aging) TIFF standard.
·     The DjVu format provides the greatest compression of all – achieving a slightly better ratio than even the LuraTech compressor (99.6%) – however this is at the cost of universal compatibility – very few desktops are likely to have the required software to view this file format and the 0.1% gain does not seem significant enough to warrant overcoming this hurdle. In addition DjVu is not an acceptable format within the government e-GIF technical standards.
·     Lower scan colour bit depths (e.g. 8-bit) are unavailable – most scanner drivers do not support this as post processing and compression reduces palette ranges appropriately – the goal should always be to acquire the most information possible during the physical paper scan as this is the most expensive element of the process to repeat
These tests say nothing about response times for display of images – this is about colour compression comparison only. Lower specification PCs may struggle to manipulate uncompressed Colour TIFF images
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

Archiver|手机版|小黑屋|网上读书园地

GMT+8, 2024-11-18 18:27 , Processed in 0.283006 second(s), 18 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表