Dynamic Programming Algorithm Pdf

Dynamic Programming Algorithm Pdf

Optical Character Recognition

OCR software

OCR Software and ICR Software technology are analytical artificial intelligence systems that consider only sequences of characters rather than whole words or phrases and do not cross-validate data during the recognition process, See ExperVision, ABBYY, OmniPage or CuneiForm. Base on the analyses of sequential lines and curves, OCR and ICR make ‘best guesses’ at characters using database look-up tables to closely associate or match the strings of characters that form words. For these systems to effectively recognize hand printed or machine printed forms, words must be separated into individual characters. That is why most typical administrative forms require people to either hand print into neatly spaced boxes or use combs (tick marks) at the bottom of input lines to force spaces between letters entered on a form. Without the use of combs or boxes, conventional technologies reject fields if people do not follow the structure when filling out forms, resulting in significant administrative overhead and costs to forms processing organizations.

History

In 1929 Gustav Tauschek obtained a patent on OCR in Germany, followed by Handel who obtained a US patent on OCR in USA in 1933 (U.S. Patent 1,915,993). In 1935 Tauschek was also granted a US patent on his method (U.S. Patent 2,026,329).

Tauschek’s machine was a mechanical device that used templates. A photodetector was placed so that when the template and the character to be recognized were lined up for an exact match and a light was directed towards them, no light would reach the photodetector.

In 1950, David H. Shepard, a cryptanalyst at the Armed Forces Security Agency in the United States, was asked by Frank Rowlett, who had broken the Japanese PURPLE diplomatic code, to work with Dr. Louis Tordella to recommend data automation procedures for the Agency. This included the problem of converting printed messages into machine language for computer processing. Shepard decided it must be possible to build a machine to do this, and, with the help of Harvey Cook, a friend, built “Gismo” in his attic during evenings and weekends. This was reported in the Washington Daily News on 27 April 1951 and in the New York Times on 26 December 1953 after his U.S. Patent 2,663,758 was issued. Shepard then founded Intelligent Machines Research Corporation (IMR), which went on to deliver the world’s first several OCR systems used in commercial operation. While both Gismo and the later IMR systems used image analysis, as opposed to character matching, and could accept some font variation, Gismo was limited to reasonably close vertical registration, whereas the following commercial IMR scanners analyzed characters anywhere in the scanned field, a practical necessity on real world documents.

The first commercial system was installed at the Readers Digest in 1955, which, many years later[when?], was donated by Readers Digest to the Smithsonian, where it was put on display. The second system was sold to the Standard Oil Company of California for reading credit card imprints for billing purposes, with many more systems sold to other oil companies. Other systems sold by IMR during the late 1950s included a bill stub reader to the Ohio Bell Telephone Company and a page scanner to the United States Air Force for reading and transmitting by teletype typewritten messages. IBM and others were later licensed on Shepard’s OCR patents.

In about 1965 Readers Digest and RCA collaborated to build an OCR Document reader designed to digitize the serial numbers on Reader Digest coupons returned from advertisements. The font used on the documents were printed by an RCA Drum printer using the OCR-A font. The reader was connected directly to an RCA 301 computer (one of the first solid state computers). This reader was followed by a specialized document reader installed at TWA where the reader processed Airline Ticket stock (a task made more difficult by the carbonized backing on the ticket stock). The readers processed document at a rate of 1500 documents per minute and checked each document rejecting those it was not able to process correctly. The product became part of the RCA product line as a reader designed to process “Turn around Documents” such as those Utility and insurance bills returned with payments.

The United States Postal Service has been using OCR machines to sort mail since 1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. The first use of OCR in Europe was by the British General Post Office (GPO). In 1965 it began planning an entire banking system, the National Giro, using OCR technology, a process that revolutionized bill payment systems in the UK. Canada Post has been using OCR systems since 1971. OCR systems read the name and address of the addressee at the first mechanized sorting center, and print a routing bar code on the envelope based on the postal code. To avoid confusion with the human-readable address field which can be located anywhere on the letter, special ink (orange in visible light) is used that is clearly visible under ultraviolet light. Envelopes may then be processed with equipment based on simple barcode readers.

In 1974 Ray Kurzweil started the company Kurzweil Computer Products, Inc. and led development of the first omni-font optical character recognition system computer program capable of recognizing text printed in any normal font. He decided that the best application of this technology would be to create a reading machine for the blind, which would allow blind people to have a computer read text to them out loud. This device required the invention of two enabling technologieshe CCD flatbed scanner and the text-to-speech synthesizer. On January 13, 1976 the successful finished product was unveiled during a widely-reported news conference headed by Kurzweil and the leaders of the National Federation of the Blind. Called the Kurzweil Reading Machine, the device covered an entire tabletop. On the day of the machine’s unveiling, Walter Cronkite used the machine to give his signature soundoff, “And that’s the way it was, January 13, 1976.” While listening to The Today Show, musician Stevie Wonder heard a demonstration of the device and personally purchased the first production version of the Kurzweil Reading Machine.

In 1978 Kurzweil Computer Products began selling a commercial version of the optical character recognition computer program. LexisNexis was one of the first customers, and bought the program to upload paper legal and news documents onto its nascent online databases. Two years later, Kurzweil sold his company to Xerox, which had an interest in further commercializing paper-to-computer text conversion. Kurzweil Computer Products became a subsidiary of Xerox known as Scansoft, now Nuance Communications.

Current state of OCR technology

This section needs additional citations for verification.

Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (May 2009)

The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem on applications where clear imaging is available such as scanning of printed documents. Typical accuracy rates on these exceed 99%[citation needed]; total accuracy can only be achieved by human review. Other areasncluding recognition of hand printing, cursive handwriting, and printed text in other scripts (especially those with a very large number of characters)re still the subject of active research.

Accuracy rates can be measured in several ways, and how they are measured can greatly affect the reported accuracy rate. For example, if word context (basically a lexicon of words) is not used to correct software finding non-existent words, a character error rate of 1% (99% accuracy) may result in an error rate of 5% (95% accuracy) or worse if the measurement is based on whether each whole word was recognized with no incorrect letters.

On-line character recognition is sometimes confused with Optical Character Recognition (see Handwriting recognition). OCR is an instance of off-line character recognition, where the system recognizes the fixed static shape of the character, while on-line character recognition instead recognizes the dynamic motion during handwriting. For example, on-line recognition, such as that used for gestures in the Penpoint OS or the Tablet PC can tell whether a horizontal mark was drawn right-to-left, or left-to-right. On-line character recognition is also referred to by other terms such as dynamic character recognition, real-time character recognition, and Intelligent Character Recognition or ICR.

On-line systems for recognizing hand-printed text on the fly have become well-known as commercial products in recent years (see Tablet PC history). Among these are the input devices for personal digital assistants such as those running Palm OS. The Apple Newton pioneered this product. The algorithms used in these devices take advantage of the fact that the order, speed, and direction of individual lines segments at input are known. Also, the user can be retrained to use only specific letter shapes. These methods cannot be used in software that scans paper documents, so accurate recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80% to 90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited applications.

Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text. Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire words from a dictionary is easier than trying to parse individual characters from script. Reading the Amount line of a cheque (which is always a written-out number) is an example where using a smaller dictionary can increase recognition rates greatly. Knowledge of the grammar of the language being scanned can also help determine if a word is likely to be a verb or a noun, for example, allowing greater accuracy. The shapes of individual cursive characters themselves simply do not contain enough information to accurately (greater than 98%) recognize all handwritten cursive script.

It is necessary to understand that OCR technology is a basic technology also used in advanced scanning applications. Due to this, an advanced scanning solution can be unique and patented and not easily copied despite being based on this basic OCR technology.

For more complex recognition problems, intelligent character recognition systems are generally used, as artificial neural networks can be made indifferent to both affine and non-linear transformations.

A technique which is having considerable success in recognising difficult words and character groups within documents generally amenable to computer OCR is to submit them automatically to humans in the reCAPTCHA system.

OCR software language support

Name

Latest version

Release year

Recognition languages

Dictionaries

ExperVision TypeReader & OpenRTK

8.0

2010

English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, Simplified Chinese, Traditional Chinese, Russian, Finnish and Polynesian

ABBYY FineReader

10

2009

Abkhaz, Adyghian, Afrikaans, Agul, Albanian, Altai, Armenian (Eastern, Western, Grabar), Avar, Aymara, Azerbaijani (Cyrillic), Azerbaijani (Latin), Bashkir, Basic, Basque, Byelorussian, Bemba, Blackfoot, Breton, Bugotu, Bulgarian, Buryat, C/C++, Catalan, Cebuano, Chamorro, Chechen, Chinese (Simplified, and Traditional), Chukchee, Chuvash, COBOL, Corsican, Crimean Tatar, Croatian, Crow, Czech, Dakota, Danish, Dargwa, Dungan, Dutch (Netherlands and Belgium), English, Eskimo (Cyrillic and Latin), Esperanto, Estonian, Even, Evenki, Faroese, Fijian, Finnish, Fortran, French, Frisian, Friulian, Gagauz, Galician, Ganda, German (Luxemburg), German (new and old spelling), Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Ingush, Interlingua, Irish, Italian, Japanese, JAVA, Jingpo, Kabardian, Kalmyk, Karachay-balkar, Karakalpak, Kasub, Kawa, Kazakh, Khakass, Khanty, Kikuyu, Kirghiz, Kongo, Korean, Koryak, Kpelle, Kumyk, Kurdish, Lak, Latin, Latvian, Lezgi, Lithuanian, Luba, Macedonian, Malagasy, Malay, Malinke, Maltese, Mansy, Maori, Mari, Maya, Miao, Minangkabau, Mohawk, Moldavian, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Norwegian (nynorsk and bokml), Nyanja, Occidental, Ojibway, Ossetian, Papiamento, Pascal, Polish, Portuguese (Portugal and Brazil), Provencal, Quechua, Rhaeto-romanic, Romanian, Romany, Rundi, Russian, Russian (old spelling), Rwanda, Sami (Lappish), Samoan, Scottish Gaelic, Selkup, Serbian (Cyrillic and Latin), Shona, Simple chemical formulas, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabasaran, Tagalog, Tahitian, Tajik, Tatar, Thai, Tok Pisin, Tongan, Tswana, Tun, Turkish, Turkmen, Tuvinian, Udmurt, Uighur (Cyrillic and Latin), Ukrainian, Uzbek (Cyrillic and Latin), Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, Zulu

Armenian (Eastern, Western, Grabar), Bashkir, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch (Netherlands and Belgium), English, Estonian, Finnish, French, German (new and old spelling), Greek, Hebrew, Hungarian, Indonesian, Italian, Latvian, Lithuanian, Norwegian (nynorsk and bokml), Polish, Portuguese (Portugal and Brazil), Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tatar, Thai, Turkish, Ukrainian

OmniPage

17

2009

Afrikaans, Albanian, Aymara, Basque, Bemba, Blackfoot, Breton, Bugotu, Bulgarian, Byelorussian, Catalan, Chamorro, Chechen, Corsican, Croatian, Crow, Czech, Danish, Dutch, English, Esperanto, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic (Irish), Gaelic (Scottish), Galician, Ganda/Luganda, German, Greek, Guarani, Hani, Hawaiian, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Italian, Inuit, Kabardian, Kasub, Kawa, Kikuyu, Kongo, Kpelle, Kurdish, Latin, Latvian, Lithuanian, Luba, Luxembourgian, Macedonian, Malagasy, Malay, Malinke, Maltese, Maori, Mayan, Miao, Minankabaw, Mohawk, Moldavian, Nahuatl, Norwegian, Nyanja, Occidental, Ojibway, Papiamento, Pidgin English, Polish, Portuguese (Brazilian), Portuguese, Provencal, Quechua, Rhaetic, Romanian, Romany, Ruanda, Rundi, Russian, Sami Lule, Sami Northern, Sami Southern, Sami, Samoan, Sardinian, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sundanese, Swahili, Swazi, Swedish, Tagalog, Tahitian, Tinpo, Tongan, Tswana, Tun, Turkish, Ukrainian, Visayan, Welsh, Wolof, Xhosa, Zapotec, Zulu

[PDF OCR X]

1.4

2010

Bulgarian, Catalan, Czech, Chinese Simplified, Chinese Traditional, Danish, German, Greek, English, Finish, French, Hungarian, Indonesian, Italian, Japanese, Latvian, Lithuanian, Dutch, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Serbian, Swedish, Tagalog, Turkish, Ukranian, Vietnamese

Readiris

12 Pro & Corporate

2009

American English, British English, Afrikaans, Albanian, Aymara, Balinese, Basque, Bemba, Bikol, Bislama, Brazilian, Breton, Bulgarian, Byelorussian, Catalan, Cebuano, Chamorro, Corsican, Croatian, Czech, Danish, Dutch, Esperanto, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Galician, Ganda, German, Greek, Greenlandic, Haitian (Creole), Hani, Hiligaynon, Hungarian, Icelandic, Ido, Ilocano, Indonesian, Interlingua, Irish (Gaelic), Italian, Javanese, Kapampangan, Kicongo, Kinyarwanda, Kurdish, Latin, Latvian, Lithuanian, Luxemburgh, Macedonian, Madurese, Malagasy, Malay, Maltese, Manx (Gaelic), Maori, Mayan, Minangkabau, Nahuatl, Norwegian, Numeric, Nyanja, Nynorsk, Occitan, Pidgin English, Polish, Portuguese, Quechua, Rhaeto-Roman, Romanian, Rundi, Russian, Samoan, Sardinian, Scottish (Gaelic), Serbian, Serbian (Latin), Shona, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tahitian, Tok Pisin, Tonga, Tswana, Turkish, Ukrainian, Waray, Wolof, Xhosa, Zapotec, Zulu, Bulgarian – English, Byelorussian – English, Greek – English, Macedonian – English, Russian – English, Serbian – English, Ukrainian – English + Moldovan, Bosnian (Cyrillic and Latin), Tetum, Swiss-German and Kazak

Readiris

12 Pro & Corporate Middle-East

2009

Arabic, Farsi and Hebrew

Readiris

12 Pro & Corporate Asian

2009

Simplified Chinese, Traditional Chinese, Japanese and Korean

CuneiForm

12

2007

English, German, Croatian, Polish, Danish, Portuguese, Dutch, Digits, Czech, French, Romanian, Hungarian, Bulgarian, Slovenian, Latvian, Lithuanian, Estonian, Turkish, Russian, Swedish, Spanish, Italian, Russian-English (mixed), Ukrainian, Serbian

GOCR

0.47

2009

Kirtas Technologies Arabic OCR

2009

15 left-to-right languages including English, French, German, and Dutch. Arabic, Farsi, Jawi, Pashto, and Urdu, and bilingual Arabic/English, Arabic/French, and Farsi/English.

MoreData

1.0

2008

an absolutely freeware ocr software which use tesseract (from google) like ocr engine,scan multiple documents each run,text search into results, windows interface

English, French, Italian

Microsoft Office Document Imaging

Office 2007

2007

Language availability is tied to the installed proofing tools. For languages not included in your version of MS Office you’d need the corresponding Proofing Tools kit (separate purchase).

NEOPTEC DATA-SCAN

5.7

2009

French, Spanish, English.

Microsoft Office OneNote 2007

NovoDynamics VERUS

Middle East Professional

2005

Arabic, Persian (Farsi, Dari), Pashto, Urdu, including embedded English and French. It also recognizes the Hebrew language, including embedded English.

NovoDynamics VERUS

Asia Professional

2009

Simplified and Traditional Chinese, Korean and Russian languages, including embedded English

Ocrad

Brainware

HOCR

0.10.13

2008

Hebrew

OCRopus

0.3.1

2008

All the languages and scripts that Tesseract supports through the Tesseract plugin, and it supports Latin script and English for its native recognizers

ReadSoft

European characters, simplified and traditional Chinese, Korean, Japanese characters

Alt-N Technologies’

RelayFax Network Fax Manager

Sakhr OCR

2009

Arabic, English, French and 16 other languages. Farsi, Jawi, Dari, Pashto, Urdu (available optionally in extra language pack)

Bi-lingual documents in Arabic/English, Farsi/English and Arabic/French ||

Scantron Cognition

SimpleOCR

3.5

2008

English and French

SmartScore

Tesseract

2.03

2008

Can recognize 6 languages, is fully UTF8 capable, and is fully trainable

Transym – TOCR

3.0

2008

Maximum character accuracy in 11 different languages. English, French, Italian, German, Dutch, Swedish, Norwegian, Finnish, Danish, Spanish and Portuguese

See also

Wikimedia Commons has media related to: Optical character recognition

Automatic number plate recognition

CAPTCHA

Computational linguistics

Computer vision

Machine learning

Music OCR

OCR SDK

OCR Software

Optical mark recognition

Raster to vector

Raymond Kurzweil

Speech recognition

Book scanning

Institutional Repository

Digital Library

OCR-B

References

^ Suen, C.Y., et al (1987-05-29), Future Challenges in Handwriting and Computer Applications, 3rd International Symposium on Handwriting and Computer Applications, Montreal, May 29, 1987, http://users.erols.com/rwservices/pens/biblio88.html#Suen88, retrieved 2008-10-03 

^ Tappert, Charles C., et al (1990-08), The State of the Art in On-line Handwriting Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 12 No 8, August 1990, pp 787-ff, http://users.erols.com/rwservices/pens/biblio90.html#Tappert90c, retrieved 2008-10-03 

^ LeNet-5, Convolutional Neural Networks

^ SimpleOCR FAQ – dictionaries

External links

ICDAR’07, ICDAR’09, a comprehensive conference on all aspects of document recognition

17 Things Explanation of basic handwriting recognition principles and history

Unicode OCR – Hex Range: 2440-245F Optical Character Recognition in Unicode

v  d  e

Optical character recognition software

Free software

CuneiForm  GOCR  Ocrad  OCRopus  Tesseract

Proprietary software

Expervision  FineReader  Microsoft Office Document Imaging  OmniPage  Readiris  ReadSoft   SimpleOCR  SmartScore  ViewWise

Categories: Optical character recognition | Artificial intelligence applications | Applications of computer vision | Automatic identification and data capture | Computational linguistics | Unicode | SymbolsHidden categories: Articles with unsourced statements from October 2008 | All articles with unsourced statements | Wikipedia articles needing style editing from October 2008 | All articles needing style editing | Vague or ambiguous time | Articles needing additional references from May 2009 | All articles needing additional references | Articles with unsourced statements from January 2009
About the Author

I am a professional writer from China Manufacturers, which contains a great deal of information about plastic tea strainer , elegant plastic dinnerware, welcome to visit!

Create a Web Application With Two Clicks Using SynApp2


 Dynamic Programming Algorithm Pdf


Cracking the Coding Interview: 150 Programming Questions and Solutions


$16.51


Now in the 5th edition, Cracking the Coding Interview gives you the interview preparation you need to get the top software developer jobs. This is a deeply technical book and focuses on the software engineering skills to ace your interview. The book is over 500 pages and includes 150 programming interview questions and answers, as well as other advice. The full list of topics are as follows:The I…

 Dynamic Programming Algorithm Pdf


Introduction to Algorithms


$60.00


“Introduction to Algorithms, the ‘bible’ of the field, is a comprehensive textbook covering the full spectrum of modern algorithms: from the fastest algorithms and data structures to polynomial-time algorithms for seemingly intractable problems, from classical algorithms in graph theory to special algorithms for string matching, computational geometry, and number theory. The revised third edition …

 Dynamic Programming Algorithm Pdf


Schaum’s Outline of Operations Research


$9.24


Confusing Textbooks? Missed Lectures? Not Enough Time? Fortunately for you, there’s Schaum’s Outlines. More than 40 million students have trusted Schaum’s to help them succeed in the classroom and on exams. Schaum’s is the key to faster learning and higher grades in every subject. Each Outline presents all the essential course information in an easy-to-follow, topic-by-topic format. You also get h…

Have A Question?

Are you looking for something but can not find it, or have a question for us? We would love to help you. Just fill out the form below and we will get back to you as soon as possible.

anti allergy bedding gifts for twins Zygors guide review www.ordercameras.com.au buy bio clean
Section 21 Notice dog training tips Moonshine Recipe Double Your Dating Review Beginner Violin