Why Chinese Is [Not] so Damn Hard

David Moser:

… But I still feel reasonably confident in asserting that, for an average American, Chinese is significantly harder to learn than any of the other thirty or so major world languages that are usually studied formally at the university level (though Japanese in many ways comes close).

[Footnote] 3. Incidentally, I’m aware that much of what I’ve said above applies to Japanese as well, but it seems clear that the burden placed on a learner of Japanese is much lighter because (a) the number of Chinese characters used in Japanese is “only” about 2,000 – fewer by a factor of two or three compared to the number needed by the average literate Chinese reader; and (b) the Japanese have phonetic syllabaries (the hiragana and katakana characters), which are nearly 100% phonetically reliable and are in many ways easier to master than chaotic English orthography is.

I understand where Moser is coming from. I took Spanish for one year in middle school, and two in high school. Even with my total indifference to academic achievement at the time, the frankly terrible teaching style (one teacher had us repitan en Español for most of the period nearly every day), and no previous experience in learning a foreign language, I reached a basic level of conversational ability that was occasionally useful living in California. Nearly a decade later, I went to Spain for a 10 day vacation (with my then-girlfriend, now-wife) and found I’d retained enough to follow the plotline of TV dramas, get the gist of overheard conversations, and ask simple questions and understand the answers.

It took me over 3 years of constant studying and immersion on my own — following a combined year of university instruction — before I reached that level of competence in Japanese. That’s just the spoken part. In reading and writing, since I’ve had to learn mostly on the job and have rarely had much time for real study past those first few years, I still don’t function at a particularly high level. I can effectively read for content, but most nuance is beyond me, and since I came to it so late, I’ll probably never progress to where I can dispense with references when reading or writing.

I occasionally joke to my wife that my next language will be something easy, like Arabic. If I’d been studying virtually any Indo-European language for this length of time, I’d probably be both fluent and literate by now. I would probably have had time to acquire two or three languages at a decent level of ability given the same time investment.

Most of what I know about Chinese, outside of some material covered in university linguistics classes, is in connection to Japanese. Chinese had a huge influence on the way Japanese has developed since at least the first wave of adoption of Chinese writing. Around 60% of the words in a modern Japanese dictionary and nearly 20% of the spoken vocabulary are kango (漢語); words either borrowed directly or derived from Chinese. That means that Chinese had roughly the same influence on Japanese that Latin had on English.

Moser asserts that Chinese is possibly the hardest language to acquire, and in the two parts I quoted above, he directly compares the difficulty of Chinese to Japanese. I think he’s wrong. Japanese is even more difficult than Chinese. Everything he complains about in regards to Chinese is even more difficult and complicated in Japanese.

Readings

No matter how bad the sound-symbol complexity is in Chinese, it’s worse in Japanese. There are multiple readings for the same character, some of them only occurring in a single context. Japanese didn’t borrow Chinese characters just once, there were a few waves of importation over the course of centuries. The mixture of existing native Sino-Japanese words mixed with the imported Chinese ones led to some characters having a painfully large number of readings.

Virtually every kanji has at least two readings. Then there are the combinations. Which reading do you use in combination with other kanji, in which context? There are kunyomi (Sino-Japanese), on’yomi (Chinese derived), and mixtures, along with “difficult” readings that even native Japanese speakers can’t reliably produce without special study or a dictionary for reference. And this is completely ignoring the euphonic changes in reading which happen in combination with preceding or following sounds that are just a small part of what makes even simple counting such an utter bitch.

According to anything I can find, Chinese characters have a single reading in a dialect with some fairly predictable variations. Compare that to bastards like 生 in Japanese, which might (who the #^¢* knows?) have 150 readings in Japanese. Even Japanese don’t know all these readings without intensive study, which is why particularly difficult kanji or unusual readings will sometimes have furigana pronunciation guides above them. Notice that I said “sometimes.” Place names are often idiosyncratic, which makes getting directions very interesting sometimes, like when even Japanese who aren’t from that particular area don’t know the proper reading for a sign or landmark name.

The syllabaries are (mostly) unambiguous to sound out, but virtually useless for determining meaning. Moser mentioned the difficulty he had in figuring out where the word boundaries are in Chinese. If you encounter a passage written in hiragana, which is also written with no spaces between words, it will also be stripped of all the useful context provided by kanji. Here is just a taste of why the large number of homophones in Japanese make the “friendly” regularity of the katakana and hiragana syllabaries meaningless.

In addition, there are a growing number of loan words that are often — but not always — rendered in katakana. Trying to learn them as cognates, thinking that you can guess from the root language (if you can even recognize the original word) is just asking for trouble. Sticking just with a couple of English examples, a manshon in Japan is about 50 m2 and maika is not my car. One truism of Japanese loan words is that the meaning has no more than a passing connection with whatever fragment of vocabulary someone latched onto and twisted into Japanese form.

Tones

I’ll give him half of a point on this one, since Chinese tones are more complicated than Japanese. Mandarin has 4 (plus a “neutral” flat tone), and Cantonese has 6 or 7, depending on dialect. But Japanese also has pitch accent patterns somewhat similar to tones, and some of the variations produce a different meaning if you screw up, not just a muddled accent. The relative paucity of sound combinations means that there are a large number of homophones that have to be parsed according to context. It might actually be helpful if Japanese had more tones. Japanese is probably the only language to rely heavily on writing for oral communication. Japanese people will sometimes draw kanji on their hands with their finger to disambiguate identical-sounding words. May the gods help you if you don’t know the distinguishing character.

Japanese has a complexity that is not a significant factor in either major dialect of Chinese: vowel length matters. The sound units of Japanese are morae, so each mora (or what would be roughly termed a syllable in English) gets a beat. English is a stress-timed language. For an English speaker, vowel length between stress points is mostly disregarded, while the length of time between stress points is seen as important. An English speaker’s tendency to elide vowels, disregard their absence in listening for meaning, and create diphthongs sometimes leads to serious difficulties in understanding and communicating verbally in Japanese.

I still can’t reliably distinguish 病院 (byouin, hospital) and 美容院 (biyouin, beauty parlor) without a decent amount of context to help me guess, and I’m told I have a really good ear for sounds compared to most non-native speakers. Even the more simple distinction of vowel length trips me up sometimes, particularly when I’m transcribing unfamiliar vocabulary encountered in meetings or conversations for later study.

Romanization

There are two major, and three minor romanization systems, as well as wapuro character input methods for computers that differ in some ways from both. The least ambiguous for non-Japanese speakers is the Hepburn system, or Hebon-shiki in Japanese.

As you can tell from the name, Hebon-shiki was designed by a non-Japanese. Revised versions of the original are the most widespread forms of romanization outside Japan, but the home-grown Kunreisiki currently has official standing in the education ministry and most of the other government bureaus. Kunreisiki makes learning English spelling even more difficult for this generation of kids because the Hepburn system that was taught to at least the previous two generations post-War maps much better to English spelling. Kunreisiki also confuses the crap out of nearly anyone whose native language uses a roman-based alphabet and doesn’t know much about Japanese — like any non-native learner. It’s basically a romanization system created by Japanese, for Japanese, which is the epitome of uselessness in my not-so-humble opinion.

To make things more confusing, various bureaus use different systems. Road signs, passports, and rail stations all use different variations on Hepburn, while other official documents typically use Nihonsiki or Kunreisiki, depending on the bureau or department.

I’ve also encountered very different ways of rendering things into romaji, or roman letters. Some sources combine the particles with the preceding word, some combine any connecting particles into one big mess, while the clearest and least ambiguous offset the particle from the lexical component. The original title of the film translated as Spirited Away in English could be rendered as Sento Chihirono Kamikakushi, Sentoshihironokamikakushi, or Sen to Chihiro no Kamikakushi, respectively. Note that these were all written in bog-standard Hepburn romanization. Using a different flavor of Hepburn, or the Nihonsiki or Kunreisiki systems would produce even more variations.

(By the way, the title 千と千尋の神隠し is a clever kanji wordplay related to the storyline of the movie. The girl Chihiro [千尋] has part of her name stolen by the witch, Yubaba [湯婆婆, lit: bath witch] leaving her with only the 千 character, which can be read as sen or chi.)

What makes Japanese harder than Chinese

In addition to all of the writing complexity of Chinese, plus the extra readings I’ve already talked about, there are made-in-Japan kanji and specialty characters to spackle over the cracks left by jamming a non-native writing system into Japanese. The syllabaries are used in addition to kanji, not instead of kanji. Hiragana are used for okurigana, to add inflections and other grammatical information that couldn’t be adequately expressed with the Chinese system alone. Okurigana also often provide some indication of which of the possible readings is intended. When you read or write something in Japanese, you have to use at least 2 or 3 writing systems (sometimes up to 4, counting romanization and foreign terms) simultaneously.

The older forms of Japanese kanji are usually identical to the older forms of Chinese characters, but both Japan and China simplified their characters in different ways after the War. Japanese literary history is intimately connected to Chinese, so Japanese study classical Chinese texts (漢文, Kanbun) as well as Japanese (国語, Kokugo) ones. All the complaints he has about learning the ancient incomprehensible texts in classical Chinese along with modern forms is multiplied in Japanese. There are specialized dictionaries for studying 古語 (kogo, ancient words) or classical Japanese, as well as references for ancient Chinese literature.

Every complaint he has about Chinese calligraphic poetry is worse in Japanese, since the literary elite in Japan developed their own distinctive styles of calligraphy, and have both ancient Chinese texts and Japanese texts with which to make obscure allusions. Haiku and tanka seem simple until you learn that there are layers of meaning attached to each and every character choice and nuance of writing. And no, most Japanese people don’t understand all of it either.

The 2,000 characters often cited as necessary for Japanese is inadequate for full literacy. The recently revised list of 常用漢字 (Jouyou Kanji) currently taught in schools consists of 2,136 characters (with a combined 4,388音訓 on-kun readings: 2352 Chinese-derived・2036 Sino-Japanese), but most educated adults are probably familiar with between 2 and 3 times that number. Even native speakers may not be able to read aloud those that they either recognize from context, or understand the meaning of, however.

There is a standardized test, the 漢字検定 (Kanji Kentei), which tests comprehensive knowledge of meaning, usage, and readings of kanji. Level 1 (the highest level) covers 6,355 kanji, and reportedly only has about a 2% pass rate. Let me be clear: this test is meant for native Japanese speakers, and 98% of those who take it fail. For a frame of reference, about 80–85% of Navy personnel who make it to BUD/S wash out. Someone with a Level 1 Kan-ken certification is the kanji reading-writing equivalent of a high-rank Navy SEAL.

The 大漢和辞典 (Dai Kan-Wa Jiten) is cited as being the most comprehensive Japanese kanji dictionary, and has over 50,000 head entries. It’s a multi-volume set, not a single book. I mentioned this dictionary at the end of my earlier piece on a TED Talk about a method for learning to read Chinese. So, again, any complaint he has about Chinese is expanded for Japanese learners because they’re essentially compelled to learn two difficult languages — both Chinese and Japanese.

The grammar is ass-backwards from an English-speaker’s perspective. Japanese is an SOV language, while both Chinese and English are SVO. Japanese word order is so fluid that subjects can occur at the end of a sentence, and are often completely left out, usually being implied by context and/or the verb. Big deal, Spanish drops subjects too, right? How bad could it be? I’ve overheard conversations where native Japanese speakers lose track of what the hell they’re talking about.

I mentioned counting complexity earlier. In my early basic dictionary (which, contrary to what they say on their site, is nowhere near comprehensive) there are over 120 different counters for various objects. There are exceptions to the patterns in many of the counting sets, and the domains of some counters overlap. When ordering something, it’s common for the person to repeat your order using a different — but also appropriate — counter to confirm the number of items. Just giving a date in Japanese means navigating a minefield of two different number systems, plus counter words, plus exceptions and special readings for certain days.

Quick, how do you count rabbits? Like animals (匹) right? Nope, like birds. How do you count squid? The same way you count cups of saké, of course! 杯 (Wait … what?) Here’s a decent starting guide and a large compendium of counters, if you feel like diving down that particular pit of despair.

There are also a large number of words, 擬音語 (gion’go) and 擬態語 (gitaigo), that function basically like adverbs. I’ve found entries for so few of these in any bilingual dictionary that I’ve actually thought about compiling one of my own, but I’d probably blow my own brains out from frustration long before I finished the damn thing. (Well, maybe not, since guns are so hard to come by in Japan.)

Unless I’m missing several somethings in regard to Chinese study that push the degree of difficulty way beyond anything presented in Moser’s diatribe, from my own admittedly tangential knowledge of the subject I’m pretty comfortable saying that Japanese is substantially more difficult than Chinese. In order to learn Japanese even at a non-comprehensive, non-academic level, you have to learn the major part of the Chinese writing system, and then you still have to tackle Japanese.

Chineasy Book Project on Kickstarter

Chineasy is now a project on Kickstarter. At this writing, they've got 31 hours to go, and have already blown past their initial funding goal. I wrote about this previously after watching the TED Talk by ShaoLan where she outlined her approach to teaching Chinese characters.

I like the bold design elements, but I'm not so sure it will be useful for learning to actually read Chinese. It'll make a great coffee-table book though, and it might just introduce some people to the characters who would have had a much more shallow interaction without it. If you'd like a cool-looking book featuring 200 basic Chinese characters, it might be worth it to snag a copy before their campaign closes.

Learn to Read Chinese with Ease (TED Talk)

古代文字ゲーム kanji game screenshot

古代文字ゲーム kanji game screenshot

There probably were many people in the audience who didn’t know anything about Chinese writing, and so this talk was a brief introduction that went a long way toward demystifying kanji / hanzi for them. However, ShaoLan is not even close to the first person to introduce this concept. Kanji Pict-O-Graphix (1992) did a fantastic job of creating memorable and coherent pictures to match the core meaning of the characters, and even won awards for its graphic design. Even earlier, Heisig published Remembering the Kanji in 1977. Both Rowley and Heisig built on older etymological systems already in place in Japan and China.

The constituent elements are usually called radicals in English (部首; bushu in Japanese) and there’s a decent online introduction at About.com including the traditional patterns used for dividing the characters. (Incidentally, radicals are how you can explain exactly which character you mean when there might be dozens of characters with the same phonetic reading. This is similar to how we disambiguate letter sounds by saying “M as in Mike” in English.) Heisig called these “primitives” instead, but that’s just terminology. There are various methods for dividing and categorizing characters for use in dictionaries, but the most straightforward system for non-native learners is probably the newly-developed SKIP system used in the Kanji Learner’s Dictionary.

Chinese characters originally started as pictograms (representing the thing directly) and were soon adapted into ideograms (representing an associated or more abstract idea). Their meanings have sometimes mutated a bit from the earliest usages, and the forms have changed — sometimes drastically — since then. I have an iPhone game that can give even someone completely unfamiliar with the kanji a taste of how the characters have mutated from ancient Chinese to modern Japanese.

There is some instruction given in Japanese schools about combinations of the early characters. ShaoLan’s eight examples are among the first taught, since they are so easy to grasp. But oddly enough, once the basic concept of radical combinations is introduced children aren’t given explicit instruction in using the radical characters as a tool for guessing at the meaning of unfamiliar characters, or to enhance the retention of new characters. Most instruction heavily features lists and repetitive writing exercises, nothing more sophisticated than brute-force memorization. My wife (who is Japanese) was fascinated when I told her how I studied kanji. She’d never thought of learning characters through mnemonic stories. Instead, everyone just learned writing patterns and practiced the strokes, over and over and over.

In looking for links for this article, I ran across an excerpt on a blog called Nihongocentral that explains the traditional thinking pretty well.

Kanji textbooks for foreigners are based on Japanese books that use the following logic:

The smallest orthographic units in Chinese are strokes. There are six basic strokes, a dot (丶), a horizontal line (一), a vertical line (丨), a diagonal line falling from right to left (丿), a diagonal line falling from left to right ( ), and stroke with a change in direction (乚). These basic strokes have varied formats, such as different dots (e.g. in 氵, 丷, and 灬), different diagonal lines, (e.g. in 爪, 彡), lines with different curves (e.g. in 乙, 阝), and lines with hooks in different positions (e.g. in 乛, 勹).… [Et cetera, ad nauseum.]

Japanese children must learn the Jôyô Kanji, a list of (currently) 2,136 characters, by the end of high school. Roughly half of this number (1,006) are introduced gradually throughout elementary school, but the bulk of them are learned in middle school, with another few hundred and supplemental characters in high school. That means that it takes 9–12 years with the traditional approach to attain normal basic literacy. In contrast, I know non-native Japanese learners who were able to memorize around 1000 characters in just a couple of years using mnemonic approaches. When I was studying regularly, I would learn about 15–20 a day in an hour of dedicated study, though I would of course have to reinforce that study with associated vocabulary and reading practice if I wanted to retain what I’d learned for more than a few days to a week or so.

If Japanese literacy instruction was reformed to use mnemonic storytelling techniques, I think the time spent learning kanji could be considerably reduced. After all, non-native speakers have to learn a large amount of vocabulary and seemingly arbitrary readings along with the core concepts of the characters, while native Japanese kids already know the spoken language. All they need to learn is the idea, not the readings. If a dedicated non-native learner can achieve basic literacy in about 3–5 years, then native speakers should be able to use those techniques to achieve that level of reading ability even faster.

For those who are inclined to learn everything about a subject, I’d advise against “mastering” the writing systems of China or Japan. Kanji/Hanzi are a nearly bottomless black hole of study. I’ve personally seen a set of kanji dictionaries with about 50,000 entries. Yes, that’s a five followed by four zeroes — as in fifty-thousand — not a mistyping of five-thousand. There are an additional 30,000 or more characters that have been found in various ancient Chinese writings, many of which are one-off personal or place names.