KanjiCulture

— the common language of China, Japan and korea —

The Character of Characters

Written By: Bai - Feb• 15•11

The Character of Characters

In the last post, Chinese and Latin Scripts Compared, I introduced the Chinese script by comparing it to our own Latin script, based on Professor Peter A. Boodberg’s essay, “Some Basic Grammatonomic Characteristics of the Chinese Script” in his hand- printed Cedules from a Berkeley Workshop in Asiatic Philology, 1954, and my notes from his classes at UC Berkeley in 1969-70. But before showing how Chinese characters are composed, a few words about the use of the English word “character” to refer to them. The common and longstanding practice of calling Chinese wénzì 文字 ‘characters’ is good enough for general use. But if you want to understand how the Chinese script actually works, it is necessary to describe the role an individual so-called ‘character’ plays in it. The meaning of the term ‘character’ within the context of alphabetical scripts such as English, is simply not the same when applied to Chinese.

A full discussion will follow in subsequent posts where I will discuss wén 文 and 字, classifiers bùshǒu 部首, hemigrams, and so forth. The point here is to realize that a limited number of graphemes make up all Chinese graphs. It is these graphemes that more properly correspond to ‘characters’, not the fully composed graphs that result from the various combinations of the graphemes.

The fact that each Chinese graph is confined in an identically-sized box is probably part of the reason that they have been mistakenly equated with ‘letters’, i.e. ‘characters’. Imagine for a moment that all English words were required to be the same length. It would be easy then to draw a parallel between a Chinese graph and an English ‘word’. In such an imaginary case, each English word would, like each Chinese graph,  occupy an enclosed space with exactly the same dimensions as any other. Peter A. Boodberg has suggested that this spatial restriction (each graph, no matter how complex, occupies the same sized square on the page) may have been partly responsible for the fact that the phonetic aspect of Chinese writing remained rather underdeveloped, unlike, for example, Egyptian hieroglyphics which were not confined to boxes of the same size.

Thinking of each Chinese graph or kanji as analogous to an English word is better than thinking of each kanji as analogous to a letter or character of an alphabet. But “a kanji=a word’ view is still not accurate. Individual kanji do not always neatly correspond to ‘words’. They can correspond to English words, but often they do not. When they do not, kanji more closely resemble the roots of words, the suffixes and prefixes of words. Phonetically, each kanji is a single syllable and that syllable carries meaning, both alone, as a one-syllable word, and together, with one or more other syllables in multisyllabic word. Keep in mind that the vast majority of modern Chinese words consist of two syllables, represented in writing by two kanji.

But somehow— and this may be purely subjective— these bisyllabic, two-graph units, called ‘binoms’ or ‘kanji compounds’ often seem to represent a bit more than a single word, unless the English word is compound word, in which case they are probably about equivalent. This may be partially suggested by the often-felt need to use several English words to translate them. The percentage of words in the classical language that are binoms is less than for the modern language and this percentage drops the farther back in time you go. Nevertheless, bisyllabic words seem to be a characteristic of Chinese as far back as the earliest records. [[[Q.: Are binoms found on the bronze inscriptions?]]]

In the long run, it is this limited set of Chinese graphemes, rather than the complex graphs they make up, that most deserves a place in any character set that makes a claim to universality. Assuming at the high end that it was desirable to encode all of the approximately 2000 graphemes, 19,000 code points would have been freed from the Han repertoire in CJK Unified Ideographs in Unicode 2.0, for example, with the added advantage that any Chinese graph could be written on a computer by means of a combination of those approximately 2000 graphemes.  An additional 60,000 [[[Check exact number.]]] code points would have been freed in the Han repertoire of Unicode 3.1. It was indeed practical to select a specific subset of Chinese graphs and provide a code for each “character” therein, but I think it would have been more in keeping with Unicode’s goal which I understood as being able to represent the greatest possible number of the world’s scripts in one universal character set, if the “grapheme-based” approach for the Han repertoire, would have been used instead of just listing kanji. It would have been more fruitful to try to understand how Chinese was written rather than to remain fixated on merely listing the results of its use, a method that always falls short. When we want to use a really rare kanji, a unlikely variant, or a newly invented kanji, Oops, it’s not yet on the list. But that’s ‘wouldabeen’, ‘couldabeen’… water under the bridge now.

It is impractical now for Unicode to change its approach to representing the Chinese script and it is virtually impossible to imagine any region in the realm of kanji culture giving up their mandated or de facto character sets of Chinese graphs in favor of a ‘universal’ grapheme-based approach.

But as students of the script, in order to understand how kanji are composed, we must investigate the graphemes, just as we would look at the letters and syllables of English to see how English words are spelled. I’ll talk about the graphemes of the Chinese script in my next post.

Chinese and Latin Scripts Compared

Written By: Bai - Jan• 08•11

Following Professor Peter A. Boodberg, let’s compare the Chinese script with our familiar Latin script. What follows is a restatement of Peter A. Boodberg’s succinct one-page description entitled, “Some Basic Grammatonomic Characteristics of the Chinese Script” that appeared as 015-541120 in his Cedules from a Berkeley Workshop in Asiatic Philology, personally mimeographed and distributed by himself in the mid-1950s and still the best description of the Chinese script in comparison to Latin script that I have seen.

Alignment

Although the vertical alignment of the script first strikes the ordinary Westerner as most characteristic of Chinese writing, the isometry (See Metrics below.) of the graphs would better fulfill this function. In fact, Chinese is flexible in its alignment. Text aligned horizontally is quite as common nowadays as the traditional vertical alignment. Vertically aligned columns of Chinese are read right-to-left. Most modern horizontally aligned Chinese is read, like English, left-to-right. But during the 20th century right-to-left horizontal alignment has also been used. (I recall places in Taipei as recently as the 1990s where all three alignments were represented on adjacent signs affixed to building facades.) English, by contrast, is quite fixed in its horizontal dextrorsal (left-to-right) alignment.

Here is an example of two lines of Chinese text laid out in all three ways. First is a) horizontal alignment, read left to right, like English, then b) vertical alignment, read from top to bottom, and finally c) horizontal alignment, read right to left, like Arabic. All three alignments can be easily read by any literate Chinese reader although c), horizontal alignment, read right to left, is rarely used.

Two lines of Chinese in

a) horizontal alignment, like English:

L1: 道可道非常道

L2: 名可名非常名

The same two lines of Chinese in

b) vertical alignment, the traditional alignment:

L2 L1

名 道

可 可

名 道

非 非

常 常

名 道

And the same two lines in

c) horizontal alignment, read right to left, like Arabic:

道 常 非 道 可 道 :1L

名 常 非 名 可 名 :2L

Metrics

As mentioned, the isometry of the graphs, rather than the alignment of the lines of script, is probably the most characteristic feature of the Chinese script. Each graph stakes out the same size square on the page. Any graph, from the single stroke 一 meaning ‘one’ to the 29-stroke graph 鬱 meaning ‘rampant’ or ‘depressed’, and even more complex graphs, having fifty strokes or more (fortunately, these beasts are exceedingly rare), claims the same size square area on the page. Furthermore, whereas the graphemes of Latin script concatenate along one dimension, Chinese graphemes may be added in two dimensions.

Number of Strokes Per Kanji

In practice, 29 strokes is about the maximum number of individual strokes used. About 19,000 of the approximately 21,000 kanji included in the middle-sized, four-volume Morohashi Dictionary, 廣漢和辭典, a typical example of a practical but large repertoire, are written using 8 to 26 strokes. About 1,100 are written with 7 strokes or less and about the same number are written with 23 strokes or more. The winner for the number of strokes used in the largest number of kanji is 12 strokes. In the case of smaller repertoires, like the one in a Japanese kanwa dictionary intended for middle school students, with total kanji at less than 4,500, the number of kanji with high stroke counts is proportionally less. The winner for the most popular stroke count in that case is 11 strokes.

Here is a table showing the isometry of Chinese graphs. The table shows that minimalist kanji of 1, 2, 3, 4, 5, or 6 strokes, kanji of the very popular 11, 12, or 13 strokes, and complex kanji of 24, 25, 26, 27, 28, and 29 strokes each stake out the same size piece of real estate on the page. In other words, each kanji, regardless of how many strokes it has, is given a box within which to display itself that is the same size as that given to any other kanji at the same font size. Specific font design will determine how much of each box is actually used. This is like the concept of non-proportional fonts for alphabetic scripts such as Latin or English; each letter occupies a box of the same size as any other letter. But quite unlike Latin or English, each kanji box holds exactly one syllable and many of these are one syllable words. (But keep in mind that the majority of Chinese words, even in the classical language, are bisyllabic, i.e., composed of two syllables.)

Less than 1% use 1 – 6 strokes Most popular # of strokes
# strokes 1 2 3 4 5 6 10 11 12 13
Kai style
Mandarin shí kǒu wén bái jiè wèn dào wàn
Less than 1% use 24 – 29 or more strokes
# strokes 24 25 26 27 28 29
Kai style
Mandarin líng guān zàn zuān záo

Minor stylistic differences in fonts can add strokes

Although the kai style is the basis for writing virtually all kanji nowadays, there are stylistic variations within that style that affect the stroke count. In the table below, the kanji in the second row are the same as the kanji in the third row. But the kanji in the third row are displayed in a font style that uses one additional stroke for each kanji. That additional stroke is found in the classifier part of the kanji. More about classifiers shortly.
Note: the differences between these two styles will not display on your browser. This kind of stylistic difference can be brought out by using a special font.[To do.]

dào wàn chán
12 strokes 12 strokes 16 strokes
13 strokes 13 strokes 17 strokes

Number of Graphemes

English can be written with as few as about 82 graphemes (26 x 2 letters + about 30 marks and figures). For Chinese, “the number of graphemes runs from 500 to 800, estimated on a purely graphic basis, and to over 2000, if reckoned on an organic-structural, historical, and phonosemantic basis. These form in bidimensional combinations a graphicon of some 50,000 graphs or lexigrams (of which only about 10,000 are in common use.)” [Boodberg's Cedules 015-541120]