Details Chinese Writing
   Chinese Writing > Writing
 

A Study of Chinese Writing Systems

We shall, in this section, focus our energy on Chinese orthography by limiting our endeavor to only one of the six categories of logographs, the phonograms (Wang, 1981). Logographs are identifying symbols showing a system of relationships between things. One principle divides the logographs into six categories according to the way they are formed (see Section B ). Phonograms are generally formed from two components: while one suggests the meaning, the other indicates the pronunciation (see Section D.4 ). The category of phonograms is the most important because the majority of the logographs are of this type (about 90%; Tsien, 1962), and because this ingenious scheme simultaneously presents cues "sound" and "meaning," providing the most rational foundation for most orthographies (Wang, 1981). We desire to explore this unique concept of development also because this type of orthography maps the written symbol directly onto meaning (Wang, 1973; Hung and Tzeng, 1981). Chinese characters are two-dimensional abstract images, or patterns, which are the ideal vehicle for investigating problems related to pattern recognition, communicationof information, and abstraction.

The research methodology consisted of: (1) heirachical analysis, and (2) synthesis. While the former is used for the studying the relational behavior of the "formal elements" of a given character, the latter is applied to studying the "relatedness" among characters within a given family.

A. ELEMENTS OF A CHARACTER

The elements of the Chinese characters may be viewed at two distinctive levels: (i) In terms of the modern script, the most basic elements are called the "strokes", referred to as the "material elements" (Fig. 3.1), and (ii) in terms of the most basic meaningful units, called the "formal elements," consisting of significs, phonetics, and primitives (Wieger, 1965).

From the calligraphic standpoint, a character is reducible to simple strokes, which are the material elements of modern writing. The number of basic strokes in Chinese character formation is only nine (Fig.3.1). But, if we include variations, then the number of basic units is increased to seventeen. At any rate, this is a very small number of distincitve items. The beauty of simplicity makes it easy to remember and to work with by a beginner.

But, on the other hand, being relatively few in quantity, they must neccesarily appear in different characters more often. That reduces their effectiveness in being distinctive and specific as symbols. Redundancy is an inherent problem when working at the "stroke" level.

From the logical, etymological viewpoint, however, a compound is composed, not of strokes, but of more simple characters having their own meaning and use. In contrast to the material elements, these informational units are called "formal elements". These basic, meaningful units, then, constitute the essential items for heirarchical analysis (Chu, 1981).

.DISCUSSION

But, at which level should one deal with the Chinese characters? Should we work with the material elements (i.e., strokes) only, or should we work with the formal elements exclusively? This can be quite a controversial matter if one insists on a given point of view because it is related to the more fundamental question: Is language descriptive or prescriptive, in essence? Maybe needs exist for both, namely: The material elements are useful to the beginners who can only recognize strokes in a new character. So, on this ground alone, strokes fulfill a need. There are many existing dictionaries that use index systems employing stroke counts as a means of locating a character.

On the other hand, the justification for using the formal elements are: (a) In the human mind, information is recognized as perceptual patterns, not as un-analyzed templets, because smaller features can be recognized more economically (Anderson, 1980; Reed, 1974) (Fig.3.2). (b) The human mind can process information better if it is in larger "chunks" (Miller, 1960). (c) In his learning theory Ausubel (1978) stresses the importance of "meaningful learning."

"Meaningful learning" is a process whereby new information is related to relevant concepts (i.e., meaningful entities) already existing in the learner's mind. Therefore, a new character should be learned, not as a word, but as a relevant concept related to other concepts already known to the learner.

B. LOGOGRAPHS

The earliest Chinese graphs were found on shells and bones dating back some 3,400 years. Judging by the extent to which these shell-and-bone logographs are already conventionalized, it is reasonable to infer that true writing emerged considerably earlier, although we do knot know exactly when (Wang, 1981).

The earliest dictionary is Shuowen Jiezi (121 A.D.), with a compilation of 9,353 logographs. Its author, Xu Shen, applied two principles of organization: One divided the logographs into six categories according to the way they were formed. These are: (1) Xiangxing (pictographs), (2) Zhishi (simple ideograms), (3) Huiyi (complex ideograms), (4) Jiajie (phonetic loans), (5) Xingsheng (phongrams), (6) Zhuanzhu (derivatives) (Appendix iv). The pictographs are iconic. Ideograms are formed with pictographs to suggest an idea. Phonograms are also made up of two or more components. While the signific suggests its meaning, the phonetic indicates its pronunciation (Wang, 1981).

Then most productive category is that of the phonograms, estimated in a Qing dynasty (1644-1911 A.D.) study to comprise 82% of the logographs, and in a later study (Tsien, 1962; Wang, 1981) about 90%.

C. METHODOLOGY

REVIEW

Before introducing the methodology, we shall first review several other dissertations on Chinese characters so that we may benefit from these earlier research efforts.

The first to be cited is that of Rankin (1965), entitled: "A Linguistic Study of the Formation of Chinese Characters", in which the process for forming Chinese characters was described. Two grammars resulted: (1) A "generative grammar" with a set of rules for compnent combination, and (2) A "decomposition grammar," for forming components.

The generative grammar generates single component characters like , and also complex characters by means of the rules of combination, of which there are three ways: (1) horizontal combination (as in ) (2) vertical combination (as in ), and surrounding combinants(as in ), (3) More complex characters are generated by means of successions of these combination rules.

The decomposition to the previously stated definitions of "material elements" and "formal elements" of a character, Rankin's system somtimes worked with the "formal elements," and other times with the "material elements." This came about because of arbitrariness in interpreting how a modern script should be partitioned. This can be a serious problem because the same character can be seen differently by different people.

The second on is that of Chou (1972), entitled: "A New Alphameric Code for Chinese Ideographs: Its Evaluations and Applications". This coding system is based on the meaningful components of a Chinese character. This system, though well designed, is partition-dependent in its coding scheme. Generally speaking, it is not efficient.

The third study is that of Leide (1975), entitled: "Classification Development Effect from Graphical Hierarchies". It analyzes the graphical structure of the Small Seal characters and develops a classification scheme for translating the graphical elements into a nemrical notation. His classification by the number of pieces, vertices, and faces produced a data reduction of 99.49% with no class containing more than 32 characters.

Alphabetization and numerical ordering are the two basic methods traditionally used, but Leide's aim was to develop a non-alphanumeric notation of classification that excludes syntactic and semantic considerations, resting solely upon the structural elements. His study differed from other classifications in that he attempted to develop a position-independent classification. A linear notation (i.e., a series of digits), was developed for representing Small Seal characters.

For application asw an input code into the keyboard of a computer system, this system would be too time-consuming, inconvenient, and the problem of redundancy too great. However, his idea of a position-independent system is definitely attractive.

1. HIERARCHICAL ANALYSIS

This hierarchical analysis is different from that of any of the previous work reviewed. It functions like the Fourier or Laplace transforms, with which one generally starts in the "time" domain, in which one seeks a solution but cannot have it directly. Therefore, the problem is transformed into the "frequency" domain, solved there, and then transformed back to the "time" domain.

In a similar manner, we find it too difficult to work in the domain of the "modern script," therefore, we transform it into the domain of the "Small Seal". Then, after having worked out the hierarchical structure, we transform it back into the "modern script" domain.

Now, let us proceed with the hierarchical analysis. This is a methodology to study the logical and etymological nature of Chinese characters. A Chinese character is analyzed by decomposing it into its "formal elements". In so doing, one discovers how a character is formed, why it is so formed, and the relationships among its immediate constituents (I.C.'s) (Fig.1.10).

The "formal elements" are significs, phonetics, and primitives as defined in Wieger (1965). We have refined and redefined them so that the operational definitions are more concise and specific. For instance, the term, "significs," is redefined into two seperate terms: "Primary significs," and "derived significs." And the term "primitives" has been redefined too. We have defined it solely as symbols that are absolute, "ideographic minimums" (Wieger, 1965).

.DEFINITIONS

Signific : The symbol which represents the "meaning" aspects of a character, of which there are two sub-groups, namely:

.Primary signific: The simplest of all significs that cannot be further decomposed into meaningful parts, that has an associated "sound" part, and from which other significs, or phonetics are derived. A primary signific is the "head" of a family of characters, or the "root" in the hierarchical tree of related characters. For example,

This is a primary signific denoting ONE, ONENESS, or HEAVEN (Wieger, 1965).

.Derived Signific: A signific derived from the duplication of a "primary signific," or from a primary signific in conjuction with other formal elements. For example,

This is a "derived signific" denoting the number, TWO. In this case, it is derived by the duplication of the primary signific, ONE.

.Phonetic: A symbol that represents the "sound" aspect of a character, generally formed by combining a signific (primary, or derived) with other formal elements. For example,

This phonetic is formed by combining two significs: RAIN (derived signific) and BIRD (primary signific). It, SUDDEN RAIN , is then used in conjunction with other significs (primary significs, generally) to form a new character. For example,

This character is formed by combining the phonetic and the primary signific , having the "sound" aspect governed by this phonetic, .

.Primitive: A symbol with only a simple meaning and structure, so simple that it cannot be further decomposed into meaningful parts, and may not even possess any "sound" (i.e., uttered independently), and so primitive that generally it does not exist as an independent character. For example,

This is a primitive denoting "clouds", and with no "sound" part. A primitive might be thought of as a special case of the primary signific that possesses few or no descendents.

.DECOMPOSITION RULES

Rule #1: The primary signific, since it is the "head" of a family, is given full membership, and is placed at level 1 (Fig.3.2).

Rule #2: A primitive is also given full membership since it is the head of its own family (Fig.3.2).

Rule #3: Whenever a character is formed by having more than one disjont elements put together, the membership of this new character is located at one level lower than the lowest of the constituent elements present (Fig.3.3)

.THE MEMBERSHIP FUNCTION

Based on our direct experience with hierarchical trees for all of the 214 significs, more than 800 phonetics, and thousands of phonograms, we have found that the depths of the hierarchical trees mostly go as deep as five levels, and as few as deep as seven. Therefore, it seems quite adequate to define a simple "membership function" that exists in the level of [0,1] (Zadeh, 1965) such that the value of the "grade of membership" is equal to 1.0 for full membership at level 1 of the hierarchical tree (Zadeh, 1965; Bellman, Kalaba, Zadeh, 1964; Kaufmann, 1975). The value of the grade of membership equals 0.9 at level 2 of the hierarchical tree (i.e., 0.1 lower than that of the full membership), etc. (Fig.3.3). Such a subjectively defined, monotonically decreasing function dropping at a rate of 0.1 in value per level of the hiearchical tree (see Fig.3.3) was found satisfactory for the problem at hand. It is also a good practice to keep the membership function as simple as possible

By applying the above rules of character decomposition, one may determine at which level a character should be assigned, and what are its immediate constituents (I.C.'s).

Consider the character, TWO: Since it is derived from two primary significs, which happen to be the same, therefore, the new character has for its grade of membership 0.9, and is placed at level 2 of the hierarchical tree (Decomposition Rule #3 ). The character, ONE, is assigned a value of 1.0 for its grade of membership by virtue of the fact that it is a primary signific (Fig.3.3; Decomposition Rule #1 ). Since TWO is derived directly from the primary signific ONE, therefore, it is a "derived signific" (Figs.3.2 & 3.3).

Consider the character, THREE: Since it is derived from the combination of the primary signific ONE and the derived signific TWO, it thus has the value of 0.8 and is placed at level 3 hierarchically (Decomposition Rule #3; Fig.3.3).

DISCUSSION

Regarding the three dissertations reviewed, it seems that their objectives were good, but unfortunately their results fall short. Nevertheless, we have learned much from their experiences. Specifically, we shall incorporate these desireable features into our design considerations:

.A description of the process for the decomposition on Chinese characters at the "formal elements" level and in terms of the Small Seal.

.A position-independent classification scheme.

.A convenient notation for encoding the charcters, with the minimum amount of redundancy and better than any of the existing schemes.

.An efficient coding equal to or better than that for the English language.

.A Chinese keyboard that is of about the same physical size as the existing computer keyboard.

Distinguishing Features & Benefits

The distinguishing features and benefits of the hierarchical analysis are:

  1. By applying the hierarchical analysis, an organizational structure (hierarchical) is placed upon the etymological data (characters) such that one can specify at which level a given character should be reading in the information chain, both vertically and horizontally, of the hierarchical tree.
  2. The "root" of the tree of a group of "related characters" is a "primary signific," which is always identifiable. This makes data conversion from the conventional dictionary easy, as most of them use significs for indexing.
  3. In the hierarchical structure, the entities can be un-ambiguously defined as primary signific, derived signific, or phonetic. Some kind of color-coded scheme may be employed to further enhance their distinctiveness and importance.
  4. A clear connection (i.e., an information chain) is established from the primary signific, through derived signific, phonetic, to phonograms. This is important because about 90% of the characters are phonograms.
  5. The link between significs and phonetics is especially important because it thus unifies the "meaning" system and the "sound" system into one complete, logical system.
  6. It reduces the number of basic building blocks for Chinese characters down to a more managable size (i.e., from 214 to 117).
  7. It further sepereates the primary significs into two groups: thoes with derived significs and thoes without. The ones with derived significs are more important.
  8. It provides an efficient organizational structure for the large body of more than 800 phonetics, which previously was almost unmanageable.
  9. This new organizational structure gives optimal partitioning of characters such that redundancy is reduced to a practical minimum.
  10. It is a process for decomposition of Chinese characters at the "formal elements" level using small seal, which satisfies item #1 of the desirable features listed under "Discussion.:
2. SYNTHESIS

Historically, it was said that there were over 80,000 in the repertoire of the Chinese characters, while the dictionary of Kang-xi contained over 40,000 characters (Wieger, 1965). Indeed, this rather large quantity presents a problem both in teaching and in learning. This is not only a paramount problem for the mind, but even difficult for an information system using Chinese characters.

Specifically, a bottleneck exists in the input/output system of Chinese language computers. This problem is becoming more acute in view of the fact that more Asian business and governmental instutions are becoming computerized, as well as that, with the advent of micro-computers, automation is even within reach of small businesses and individuals. So, the pressure is on in the sense that the market demand is there now, but the Technology is lacking an efficient keyboard (Wang, 1981).

Presently, there is lacking an efficient keyboard design for Chinese computing systems. This implies that there is lacking an in-depth research in the coding theory for Chinese characters, which demands a good understanding in Chinese orthography.

Thus far, there have been rather poor designs using expedient approches that lack sound engineering development as well as theoritical research. They are neither effective nore efficient. They all, to some degree, have the same problem of redundancy in their encoding schemes. Generally, their attempts at alleviating the redundancy problem are but complicated by longer codes. Therfore, the net gain is nil. The Critical question is how are we going to deal with this large boby of characters! Are they really manageable?

Contrary to the popular belief that Chinese characters are un-related, individual entities, we have found that they are related, and it is a matter of degree of relatedness. Specifically, Chinese characters can be organized into related groups, and described in terms of a model, which will be treated later.

From the hierarchial analysis, we see that certain characters are related to others because they have among them one or more formal elements in common. The relatedness of characters can be expressed hierarchically in terms of their relative degree of complexity. That is, they can be ordered in such a way that the more basic ones are closer to the top of the tree.

Thus, a classification scheme is envolved, of which the notation mav be viewed as a self-contained language. Classification is the translation of the (distinctive) features of an item into that language, and the ordering is achieved in steps, as fallows: "The aspects of an item in the universe abstracted into features, which are then translated into the notion of classification" (Bremermann, 1971; Fig.3.4)

Referring back to Fig.3.3, notice that on the extreme left hand column is indicated the levels: L1 for level one; L2 for level two, and so forth. On the far right is the column of u values denoting the value of the membership function (Zadeh, 1965). We shall postpone our discussion of these u values for now.

Figs.3.5 and 3.6 show more examples of trhe analysis techniques. For example, the phonetic SPIRIT of Fig.3.6j is located at level four, L4. This is so because it is derived from the two independent characters, LARGE RAIN and MAGIC INCANTATION both at level three, L3. Of these the former is derived from the derived signific RAIN at L2, and from the primitive BIG DROPS at L1; it is located at L3 by applying the Decomposition Rule #3. Regarding the latter character, it is derived from the primary signific WORK at L1 and TWO MEN at L2. The symbol TWO MEN is derived from the operation of transforming the primary signific, MAN into two men dancing face-to-face, which is located at L3 by the Decomposition Rule #3.

In Fig.3.6, the hierarchical analysis reveals that RAIN is the common element among all the phonetics shown, therefor, it may be factored out. This then establishes the rerlationship that all seven phonetics belong to the same set (i.e., within the same family headed by RAIN). Therefor, we may re-assemble them into a new hierarchical tree, located on the right half of Fig.3.7a. Note that under RAIN there are four phonetics at L3, two at L4, and one phonetic at L5.

This process of integrating the various phonetics into one meaningful tree is called the process of synthesis.

The left half of Fig.3.7a shows the family of related characters pertaining to the derived signific, TWO. Fig.3.7b shows the hierarchical tree corresponding to Fig.3.7a, in modern script.


 
 
© Copyright 2004 Chinese Software Guide. All rights reserved.