The Character of Characters
In the last post, Chinese and Latin Scripts Compared, I introduced the Chinese script by comparing it to our own Latin script, based on Professor Peter A. Boodberg’s essay, “Some Basic Grammatonomic Characteristics of the Chinese Script” in his hand- printed Cedules from a Berkeley Workshop in Asiatic Philology, 1954, and my notes from his classes at UC Berkeley in 1969-70. But before showing how Chinese characters are composed, a few words about the use of the English word “character” to refer to them. The common and longstanding practice of calling Chinese wénzì 文字 ‘characters’ is good enough for general use. But if you want to understand how the Chinese script actually works, it is necessary to describe the role an individual so-called ‘character’ plays in it. The meaning of the term ‘character’ within the context of alphabetical scripts such as English, is simply not the same when applied to Chinese.
A full discussion will follow in subsequent posts where I will discuss wén 文 and zì 字, classifiers bùshǒu 部首, hemigrams, and so forth. The point here is to realize that a limited number of graphemes make up all Chinese graphs. It is these graphemes that more properly correspond to ‘characters’, not the fully composed graphs that result from the various combinations of the graphemes.
The fact that each Chinese graph is confined in an identically-sized box is probably part of the reason that they have been mistakenly equated with ‘letters’, i.e. ‘characters’. Imagine for a moment that all English words were required to be the same length. It would be easy then to draw a parallel between a Chinese graph and an English ‘word’. In such an imaginary case, each English word would, like each Chinese graph, occupy an enclosed space with exactly the same dimensions as any other. Peter A. Boodberg has suggested that this spatial restriction (each graph, no matter how complex, occupies the same sized square on the page) may have been partly responsible for the fact that the phonetic aspect of Chinese writing remained rather underdeveloped, unlike, for example, Egyptian hieroglyphics which were not confined to boxes of the same size.
Thinking of each Chinese graph or kanji as analogous to an English word is better than thinking of each kanji as analogous to a letter or character of an alphabet. But “a kanji=a word’ view is still not accurate. Individual kanji do not always neatly correspond to ‘words’. They can correspond to English words, but often they do not. When they do not, kanji more closely resemble the roots of words, the suffixes and prefixes of words. Phonetically, each kanji is a single syllable and that syllable carries meaning, both alone, as a one-syllable word, and together, with one or more other syllables in multisyllabic word. Keep in mind that the vast majority of modern Chinese words consist of two syllables, represented in writing by two kanji.
But somehow— and this may be purely subjective— these bisyllabic, two-graph units, called ‘binoms’ or ‘kanji compounds’ often seem to represent a bit more than a single word, unless the English word is compound word, in which case they are probably about equivalent. This may be partially suggested by the often-felt need to use several English words to translate them. The percentage of words in the classical language that are binoms is less than for the modern language and this percentage drops the farther back in time you go. Nevertheless, bisyllabic words seem to be a characteristic of Chinese as far back as the earliest records. [[[Q.: Are binoms found on the bronze inscriptions?]]]
In the long run, it is this limited set of Chinese graphemes, rather than the complex graphs they make up, that most deserves a place in any character set that makes a claim to universality. Assuming at the high end that it was desirable to encode all of the approximately 2000 graphemes, 19,000 code points would have been freed from the Han repertoire in CJK Unified Ideographs in Unicode 2.0, for example, with the added advantage that any Chinese graph could be written on a computer by means of a combination of those approximately 2000 graphemes. An additional 60,000 [[[Check exact number.]]] code points would have been freed in the Han repertoire of Unicode 3.1. It was indeed practical to select a specific subset of Chinese graphs and provide a code for each “character” therein, but I think it would have been more in keeping with Unicode’s goal which I understood as being able to represent the greatest possible number of the world’s scripts in one universal character set, if the “grapheme-based” approach for the Han repertoire, would have been used instead of just listing kanji. It would have been more fruitful to try to understand how Chinese was written rather than to remain fixated on merely listing the results of its use, a method that always falls short. When we want to use a really rare kanji, a unlikely variant, or a newly invented kanji, Oops, it’s not yet on the list. But that’s ‘wouldabeen’, ‘couldabeen’… water under the bridge now.
It is impractical now for Unicode to change its approach to representing the Chinese script and it is virtually impossible to imagine any region in the realm of kanji culture giving up their mandated or de facto character sets of Chinese graphs in favor of a ‘universal’ grapheme-based approach.
But as students of the script, in order to understand how kanji are composed, we must investigate the graphemes, just as we would look at the letters and syllables of English to see how English words are spelled. I’ll talk about the graphemes of the Chinese script in my next post.
