Algorithms for Off-line Recognition of Chinese Characters
PhD
Thesis, 1998
Manling Ren
Abstract
Computer recognition of Chinese characters is a challenging topic and important research area. It is relevant to documentation, publications, language translation, handwriting of Chinese and Japanese ‘Kanji’ in industry, business, diplomacy and daily life. Typical development of the recognition process focuses on printed, on-line and off-line hand-written characters using techniques including a two-layer hierarchy, four-corner, radical and a whole character recognition. Although existing recognition methods have achieved some success, the lack of fundamental algorithms for representing the structure of Chinese characters has prevented the recognition of characters within large vocabulary and having a complicated topological structure embedded within the 2-D pictorial format. The current project develops a new structural representation to remedy the lack of an effective recognition process of such characters. The research also investigates methods of dealing with variable size, position, shape, vagueness and ambiguity of a character. A key input character method using manual operation, called the ‘Cang-Jie’ method, is applied as an effective tool for verification of a Chinese character.
A novel method is developed to represent the structure of Chinese characters: a three-layer hierarchy of character-radical-stroke and its process: character-radical-code, which is specially suited for 2-D objects with topological features. The character is deconstructed into radicals according to their shape, position and extraction order. Radicals are classified into 26 categories in terms of their shape structure and meanings. Recognition of a radical yields the code of the category to which it belongs. The chain code method is applied to restructure these category codes into a 1-D chain code. The chain code is verified by matching it to a code database. To further enhance the method, a fuzzy neural network system has been designed and implemented to recognize characters in printed and standard writing, using uncertainty and topology analysis, fuzzy possibilistic reasoning, neocognitron and associative memory neural networks, chain code method and error probability method. A software system has been written using the C programming language and X View function. Test results of the system have been obtained. Improvement of the system to deal with vagueness and ambiguity (two separate characteristics) during recognition has been carried out at several stages and the recognition rate has been increased to 96%.
The main achievements include the structural representation of Chinese characters, extraction of radicals, recognition and verification of characters, and simplifying the recognition process.
Key words: Analysis of error probability, associative memory neural network, chain code method, fuzzy possibilistic inference rules, recognition of radicals, restructuring of chain codes, three-layer hierarchy of Chinese characters, 2-D topological structure.