Kanjilicious

Kanjilicious is a game/study tool looking for funding on Kickstarter. He’ll be using the funds to buy rights for assets like music and illustrations, as well as paying some people for app-related work. I’m personally not a big fan of gamifying study. I get better results by working alone most of the time (shocker), but I’m not most people.

The huge popularity of games like Words with Friends and Letterpress show that a lot of people like competitive word/puzzle games. I think he’s going about it in the right way, by setting up a base that he can build on for future games and kanji sets.

Kanjilicious has a learning tool at its base, and even anti-social humbugs like me may find that the social gaming feature of the app gets them to open the app and review more often, whether they actually engage in game matches or not. My son will very shortly be getting to the point where he could really use bilingual learning tools (he’ll be turning 4 in a couple of weeks) and this might be an app that he’ll use for learning to read in Japanese.

I’m in for $25 because I think this is an interesting approach — and having only the first set unlocked would make it really, really pointless for me. The game aspect will probably make it appealing even to people who only have a casual interest in learning Japanese, which I think is a Very Good Thing. Demystifying a subject for people is nearly always positive, in my experience. If you have even a slight interest in learning the Japanese writing systems, backing this project would probably pay off for you.

I’ve used iFlash on my Mac from around 2005 for learning Japanese vocabulary and kanji. I also have used the free Japanese dictionary imiwa? since I got my first iPhone, back when it used to be called Kotoba!

I’ve tried several other apps for making study more fun or ingraining habits, like 30Day Japanese Words and similar structured study apps, but the ones I’ve found most “sticky” are ones that were multiply useful. imiwa? has a Favorites list that I can export and use to make a personalized deck of words which I’ve had to look up recently. iFlash is a venerable tool that I’ve used for memorization practice for years, and the creator set up a deck-sharing feature that I’ve found interesting; I recently downloaded a Russian cyrillic letter deck, for example.

I won’t know until I actually use it, but it looks like the kind of app I’ll find useful enough to launch on a regular basis.

“Study Shows Blah-Blah-Blah”

Popular science articles almost always go with this kind of headline. The problem is that it’s almost always inaccurate. The vast majority of the time, when you check on the primary source, you find that the reporter hyped it, misinterpreted it, or is making some other error like citing a tiny study as conclusive. Even worse is when they phrase it as a question.

John Gruber at Daring Fireball has commented that if a tech article asks a question in the headline, you can safely assume that the answer to that question is “no”. “Is [insert ridiculous product name here] an iPad Killer?” Nope. The same applies to pop-sci articles.

I’m being charitable to the researchers by assuming that their research has simply been misinterpreted, but sometimes they’re overstating the case for their own research, as I’ve pointed out in the past. I deliberately phrased my headline in that piece as a question, because I was aping the usual pop-sci junk headline. It was still better than the original Atlantic title, since I removed the inflammatory “fad” and showed that the assertion was just flat wrong.

If you see an article with a headline like this: “Study Shows Foo and Also Bar!” you can safely assume that one of the following is probably true:

  • No it doesn’t.
  • It might, but the connection is tenuous.
  • It might, but needs confirmation.

You should always check the original research — which should be cited in even a semi-crappy article — to see for yourself what was actually published.

Why Chinese Is [Not] so Damn Hard

David Moser:

… But I still feel reasonably confident in asserting that, for an average American, Chinese is significantly harder to learn than any of the other thirty or so major world languages that are usually studied formally at the university level (though Japanese in many ways comes close).

[Footnote] 3. Incidentally, I’m aware that much of what I’ve said above applies to Japanese as well, but it seems clear that the burden placed on a learner of Japanese is much lighter because (a) the number of Chinese characters used in Japanese is “only” about 2,000 – fewer by a factor of two or three compared to the number needed by the average literate Chinese reader; and (b) the Japanese have phonetic syllabaries (the hiragana and katakana characters), which are nearly 100% phonetically reliable and are in many ways easier to master than chaotic English orthography is.

I understand where Moser is coming from. I took Spanish for one year in middle school, and two in high school. Even with my total indifference to academic achievement at the time, the frankly terrible teaching style (one teacher had us repitan en Español for most of the period nearly every day), and no previous experience in learning a foreign language, I reached a basic level of conversational ability that was occasionally useful living in California. Nearly a decade later, I went to Spain for a 10 day vacation (with my then-girlfriend, now-wife) and found I’d retained enough to follow the plotline of TV dramas, get the gist of overheard conversations, and ask simple questions and understand the answers.

It took me over 3 years of constant studying and immersion on my own — following a combined year of university instruction — before I reached that level of competence in Japanese. That’s just the spoken part. In reading and writing, since I’ve had to learn mostly on the job and have rarely had much time for real study past those first few years, I still don’t function at a particularly high level. I can effectively read for content, but most nuance is beyond me, and since I came to it so late, I’ll probably never progress to where I can dispense with references when reading or writing.

I occasionally joke to my wife that my next language will be something easy, like Arabic. If I’d been studying virtually any Indo-European language for this length of time, I’d probably be both fluent and literate by now. I would probably have had time to acquire two or three languages at a decent level of ability given the same time investment.

Most of what I know about Chinese, outside of some material covered in university linguistics classes, is in connection to Japanese. Chinese had a huge influence on the way Japanese has developed since at least the first wave of adoption of Chinese writing. Around 60% of the words in a modern Japanese dictionary and nearly 20% of the spoken vocabulary are kango (漢語); words either borrowed directly or derived from Chinese. That means that Chinese had roughly the same influence on Japanese that Latin had on English.

Moser asserts that Chinese is possibly the hardest language to acquire, and in the two parts I quoted above, he directly compares the difficulty of Chinese to Japanese. I think he’s wrong. Japanese is even more difficult than Chinese. Everything he complains about in regards to Chinese is even more difficult and complicated in Japanese.

Readings

No matter how bad the sound-symbol complexity is in Chinese, it’s worse in Japanese. There are multiple readings for the same character, some of them only occurring in a single context. Japanese didn’t borrow Chinese characters just once, there were a few waves of importation over the course of centuries. The mixture of existing native Sino-Japanese words mixed with the imported Chinese ones led to some characters having a painfully large number of readings.

Virtually every kanji has at least two readings. Then there are the combinations. Which reading do you use in combination with other kanji, in which context? There are kunyomi (Sino-Japanese), on’yomi (Chinese derived), and mixtures, along with “difficult” readings that even native Japanese speakers can’t reliably produce without special study or a dictionary for reference. And this is completely ignoring the euphonic changes in reading which happen in combination with preceding or following sounds that are just a small part of what makes even simple counting such an utter bitch.

According to anything I can find, Chinese characters have a single reading in a dialect with some fairly predictable variations. Compare that to bastards like 生 in Japanese, which might (who the #^¢* knows?) have 150 readings in Japanese. Even Japanese don’t know all these readings without intensive study, which is why particularly difficult kanji or unusual readings will sometimes have furigana pronunciation guides above them. Notice that I said “sometimes.” Place names are often idiosyncratic, which makes getting directions very interesting sometimes, like when even Japanese who aren’t from that particular area don’t know the proper reading for a sign or landmark name.

The syllabaries are (mostly) unambiguous to sound out, but virtually useless for determining meaning. Moser mentioned the difficulty he had in figuring out where the word boundaries are in Chinese. If you encounter a passage written in hiragana, which is also written with no spaces between words, it will also be stripped of all the useful context provided by kanji. Here is just a taste of why the large number of homophones in Japanese make the “friendly” regularity of the katakana and hiragana syllabaries meaningless.

In addition, there are a growing number of loan words that are often — but not always — rendered in katakana. Trying to learn them as cognates, thinking that you can guess from the root language (if you can even recognize the original word) is just asking for trouble. Sticking just with a couple of English examples, a manshon in Japan is about 50 m2 and maika is not my car. One truism of Japanese loan words is that the meaning has no more than a passing connection with whatever fragment of vocabulary someone latched onto and twisted into Japanese form.

Tones

I’ll give him half of a point on this one, since Chinese tones are more complicated than Japanese. Mandarin has 4 (plus a “neutral” flat tone), and Cantonese has 6 or 7, depending on dialect. But Japanese also has pitch accent patterns somewhat similar to tones, and some of the variations produce a different meaning if you screw up, not just a muddled accent. The relative paucity of sound combinations means that there are a large number of homophones that have to be parsed according to context. It might actually be helpful if Japanese had more tones. Japanese is probably the only language to rely heavily on writing for oral communication. Japanese people will sometimes draw kanji on their hands with their finger to disambiguate identical-sounding words. May the gods help you if you don’t know the distinguishing character.

Japanese has a complexity that is not a significant factor in either major dialect of Chinese: vowel length matters. The sound units of Japanese are morae, so each mora (or what would be roughly termed a syllable in English) gets a beat. English is a stress-timed language. For an English speaker, vowel length between stress points is mostly disregarded, while the length of time between stress points is seen as important. An English speaker’s tendency to elide vowels, disregard their absence in listening for meaning, and create diphthongs sometimes leads to serious difficulties in understanding and communicating verbally in Japanese.

I still can’t reliably distinguish 病院 (byouin, hospital) and 美容院 (biyouin, beauty parlor) without a decent amount of context to help me guess, and I’m told I have a really good ear for sounds compared to most non-native speakers. Even the more simple distinction of vowel length trips me up sometimes, particularly when I’m transcribing unfamiliar vocabulary encountered in meetings or conversations for later study.

Romanization

There are two major, and three minor romanization systems, as well as wapuro character input methods for computers that differ in some ways from both. The least ambiguous for non-Japanese speakers is the Hepburn system, or Hebon-shiki in Japanese.

As you can tell from the name, Hebon-shiki was designed by a non-Japanese. Revised versions of the original are the most widespread forms of romanization outside Japan, but the home-grown Kunreisiki currently has official standing in the education ministry and most of the other government bureaus. Kunreisiki makes learning English spelling even more difficult for this generation of kids because the Hepburn system that was taught to at least the previous two generations post-War maps much better to English spelling. Kunreisiki also confuses the crap out of nearly anyone whose native language uses a roman-based alphabet and doesn’t know much about Japanese — like any non-native learner. It’s basically a romanization system created by Japanese, for Japanese, which is the epitome of uselessness in my not-so-humble opinion.

To make things more confusing, various bureaus use different systems. Road signs, passports, and rail stations all use different variations on Hepburn, while other official documents typically use Nihonsiki or Kunreisiki, depending on the bureau or department.

I’ve also encountered very different ways of rendering things into romaji, or roman letters. Some sources combine the particles with the preceding word, some combine any connecting particles into one big mess, while the clearest and least ambiguous offset the particle from the lexical component. The original title of the film translated as Spirited Away in English could be rendered as Sento Chihirono Kamikakushi, Sentoshihironokamikakushi, or Sen to Chihiro no Kamikakushi, respectively. Note that these were all written in bog-standard Hepburn romanization. Using a different flavor of Hepburn, or the Nihonsiki or Kunreisiki systems would produce even more variations.

(By the way, the title 千と千尋の神隠し is a clever kanji wordplay related to the storyline of the movie. The girl Chihiro [千尋] has part of her name stolen by the witch, Yubaba [湯婆婆, lit: bath witch] leaving her with only the 千 character, which can be read as sen or chi.)

What makes Japanese harder than Chinese

In addition to all of the writing complexity of Chinese, plus the extra readings I’ve already talked about, there are made-in-Japan kanji and specialty characters to spackle over the cracks left by jamming a non-native writing system into Japanese. The syllabaries are used in addition to kanji, not instead of kanji. Hiragana are used for okurigana, to add inflections and other grammatical information that couldn’t be adequately expressed with the Chinese system alone. Okurigana also often provide some indication of which of the possible readings is intended. When you read or write something in Japanese, you have to use at least 2 or 3 writing systems (sometimes up to 4, counting romanization and foreign terms) simultaneously.

The older forms of Japanese kanji are usually identical to the older forms of Chinese characters, but both Japan and China simplified their characters in different ways after the War. Japanese literary history is intimately connected to Chinese, so Japanese study classical Chinese texts (漢文, Kanbun) as well as Japanese (国語, Kokugo) ones. All the complaints he has about learning the ancient incomprehensible texts in classical Chinese along with modern forms is multiplied in Japanese. There are specialized dictionaries for studying 古語 (kogo, ancient words) or classical Japanese, as well as references for ancient Chinese literature.

Every complaint he has about Chinese calligraphic poetry is worse in Japanese, since the literary elite in Japan developed their own distinctive styles of calligraphy, and have both ancient Chinese texts and Japanese texts with which to make obscure allusions. Haiku and tanka seem simple until you learn that there are layers of meaning attached to each and every character choice and nuance of writing. And no, most Japanese people don’t understand all of it either.

The 2,000 characters often cited as necessary for Japanese is inadequate for full literacy. The recently revised list of 常用漢字 (Jouyou Kanji) currently taught in schools consists of 2,136 characters (with a combined 4,388音訓 on-kun readings: 2352 Chinese-derived・2036 Sino-Japanese), but most educated adults are probably familiar with between 2 and 3 times that number. Even native speakers may not be able to read aloud those that they either recognize from context, or understand the meaning of, however.

There is a standardized test, the 漢字検定 (Kanji Kentei), which tests comprehensive knowledge of meaning, usage, and readings of kanji. Level 1 (the highest level) covers 6,355 kanji, and reportedly only has about a 2% pass rate. Let me be clear: this test is meant for native Japanese speakers, and 98% of those who take it fail. For a frame of reference, about 80–85% of Navy personnel who make it to BUD/S wash out. Someone with a Level 1 Kan-ken certification is the kanji reading-writing equivalent of a high-rank Navy SEAL.

The 大漢和辞典 (Dai Kan-Wa Jiten) is cited as being the most comprehensive Japanese kanji dictionary, and has over 50,000 head entries. It’s a multi-volume set, not a single book. I mentioned this dictionary at the end of my earlier piece on a TED Talk about a method for learning to read Chinese. So, again, any complaint he has about Chinese is expanded for Japanese learners because they’re essentially compelled to learn two difficult languages — both Chinese and Japanese.

The grammar is ass-backwards from an English-speaker’s perspective. Japanese is an SOV language, while both Chinese and English are SVO. Japanese word order is so fluid that subjects can occur at the end of a sentence, and are often completely left out, usually being implied by context and/or the verb. Big deal, Spanish drops subjects too, right? How bad could it be? I’ve overheard conversations where native Japanese speakers lose track of what the hell they’re talking about.

I mentioned counting complexity earlier. In my early basic dictionary (which, contrary to what they say on their site, is nowhere near comprehensive) there are over 120 different counters for various objects. There are exceptions to the patterns in many of the counting sets, and the domains of some counters overlap. When ordering something, it’s common for the person to repeat your order using a different — but also appropriate — counter to confirm the number of items. Just giving a date in Japanese means navigating a minefield of two different number systems, plus counter words, plus exceptions and special readings for certain days.

Quick, how do you count rabbits? Like animals (匹) right? Nope, like birds. How do you count squid? The same way you count cups of saké, of course! 杯 (Wait … what?) Here’s a decent starting guide and a large compendium of counters, if you feel like diving down that particular pit of despair.

There are also a large number of words, 擬音語 (gion’go) and 擬態語 (gitaigo), that function basically like adverbs. I’ve found entries for so few of these in any bilingual dictionary that I’ve actually thought about compiling one of my own, but I’d probably blow my own brains out from frustration long before I finished the damn thing. (Well, maybe not, since guns are so hard to come by in Japan.)

Unless I’m missing several somethings in regard to Chinese study that push the degree of difficulty way beyond anything presented in Moser’s diatribe, from my own admittedly tangential knowledge of the subject I’m pretty comfortable saying that Japanese is substantially more difficult than Chinese. In order to learn Japanese even at a non-comprehensive, non-academic level, you have to learn the major part of the Chinese writing system, and then you still have to tackle Japanese.

Insomnia doubles the risk of prostate cancer, study claims

John von Radowitz at The Independent:

Insomnia can double the risk of prostate cancer in older men, a study suggests. The risk rises proportionately with the severity of sleep problems, increasing from 1.6 to 2.1 times the usual level. The reason is not known. But a link has also been seen between insomnia and breast cancer in women.

You might want to get a good night's sleep tonight.

Time to Retire the Low-Carb Diet?

Time to Retire the Low-Carb Diet Fad

There was an Atlantic article published recently that punched a lot of my annoyance buttons. The article itself is not significantly worse than a lot of pop-sci stories, I’m just tired of reading the same kind of articles hyping “new” studies as if they held real answers for nutrition. Here are a few of the problems I have with it.

1) The study is an epidemiological study. These studies can sometimes find some interesting correlations, but they can never give conclusive results about causes and effects because of the enormous number of independent variables, even assuming the information used is impeccable.

The information is not impeccable. The way information is gathered in studies like these, through self-reported periodic questionnaires, is horribly prone to error, subjectivity, forgetfulness, and bias on both the part of the participants and the researchers. While using questionnaires may be one of the few practical ways to gather information on a large cohort, whatever you obtain is probably so muddled as to be worthless. Even if everyone was scrupulously honest and as exact as possible in answering the questionnaires, the fact of the matter is that there just aren’t that many data points to work with and the questionnaire categories are unavoidably no more than very crude estimates of intake.

To make things worse, there are confounding variables, which is why researchers sometimes have to just ignore data from huge chunks of the cohort. If the researchers are being honest, they do this to omit flawed data. If they’re not being honest — being biased either consciously or unconsciously — they do this because the information isn’t matching the conclusions they’d like to reach.

“Correlation is not causation.” If you read any kind of debates about science, you’ll hear this phrase. A lot. What it means is that sometimes one or more of the data sets you gathered correlates with another data set, but cannot be a cause of it. For example, if you find that brown-eyed people are statistically more likely to commit suicide, it doesn’t follow that having brown eyes makes you want to blow your brains out. Considering that brown eyes are more common than blue, it just means that there are more brown eyed people who have committed suicide than blue in this particular sample. If your statistical models are good and you’re being conscientious, and you’re free of conscious or unconscious bias, you’ve adjusted for this possibility. Probably. Unless you found a false correlation and didn’t realize it.

In longitudinal studies like this, there are also the effects of other factors to wonder about. How much of the change is due to age? How much to diet? How much to lifestyle? When you factor those things in, can you actually draw any useful conclusions?

Looking at the discussion section of the study, I think they actually did try to control for as many variables as they could. The problem is that there are so many things to take into account, and so many things that are impossible to objectively measure that might affect the outcomes.

Considering the number of problems involved in data gathering, the inherent subjectivity of the process, and the unreliability of the statistics involved in epidemiological studies, I would go so far as to say that most are junk science. They are a waste of money that could go toward something more worthwhile, like variable-controlled ward studies, where you have control over the subject’s environment, and measure exactly what goes in and what comes out. You don’t have to guess, estimate, or question, you measure it objectively.

2) The headline is a sensationalized version of what is suggested in the “Conclusions” section of the study. The actual title of the study is: “Associations among 25-year trends in diet, cholesterol and BMI from 140,000 observations in men and women in Northern Sweden”. In other words, “we looked at a bunch of stuff, including some basic biomarkers, and tried to find patterns in it.”

3) From this point, I’m going to just take a look at what was actually in the study, without quibbling about whether the information is accurate or not.

Their conclusion states that an increase in fat intake coincided with popular support for low-carb diets, but the data they actually reported don’t support that anyone in the cohort was following a low-carb diet. The foods they say are associated with high fat intake include many carbohydrate sources: bread, grain-based snacks, potato chips. Their reported greatest increases of fats were as spreads for bread, and oils (presumably vegetable oils) for cooking.

The carbohydrate intake was 45.9% for men in 1986, and 49.2% for women. They obscure the carb intake by reporting only fat and protein intake in the “Changing intake patterns for fat and carbohydrate 1986 to 2010” section. Interestingly, the increase in fat intake that they’re reporting between 1986 and 2010 is a 0.7% increase in men and 2.2% in women. We are looking at aggregate data here, but that small of a change is hardly something to point at as a smoking gun.

If you run the numbers for 2010 you get: Fat, Protein Carbohydrate intake for: Men (F39.95%, P14.3%, C54.2%) Women (F37.7%, P14.3%, C52%). The carbohydrate value is calculated using their figures for fat (separated by men and women) and protein (no separate figure for men and women reported). So according to their numbers, the carbohydrate intake actually increased from 1986 to 2010.

However, their reported figures for 1986 don’t total 100% (the total reported is 98.7%) so this may be an arithmetic anomaly. The problem is that they don’t show the actual values for carbohydrate intake in their paper and the graphs are too rough to tease that information out. If you simply subtract the 1.3 remainder from their 1986 numbers to approximate the values, you still have men at 52.9% and women at 50.7% carbohydrate intake. Either way you look at it, what the people in this study were eating was not a “low-carb” diet. Depending on your nutritional outlook, you could characterize it as moderate carb, perhaps. I would consider that to be a moderately-high carbohydrate diet.

For reference, an actual low-carb diet would look more like 30–40% protein, 10–20% carbohydrate, with the remaining 40–60% of the calories coming from fat. A ketogenic or cyclic ketogenic diet would limit carbohydrates even further, to about 10% of the energy intake or less.

The type of fat they report is interesting too. The highest intake values are for a butter and raps seed (sic.) blend. I assume they are referring to rapeseed, or canola oil. They seem to assume a priori that rapeseed oil is healthy, but it actually seems to be somewhat problematic. The main culprit may be erucic acid, which is limited to 5% of the oil’s content by weight for food-grade rapeseed oil in the EU, a higher amount than the 2% considered safe for human consumption in the US. Interestingly, the inclusion of saturated fat in the diet provides some protection from the undesirable effects of rapeseed oil, notably fibrotic lesions of the heart and vitamin E deficiency. So the finger wagging at saturated fat in this report is completely mis-aimed.

Two things that stood out to me were that BMI for the cohort steadily increased, and there was a 3 year lag between their reported “sharp” increase in fat intake in 2004 and the increased cholesterol values after 2007. They do acknowledge in the “Discussion” section that their study doesn’t allow them to make any conclusions about a causal relationship between the higher fat intake and blood cholesterol levels, but that doesn’t stop them from strongly implying that there is one.

In their figure 7, it’s notable that the lowest cholesterol levels were in 2002, and, even while tending upward, the current levels are still lower than they were in 1990, which showed a dramatic drop when dietary interventions were first attempted. If increased fat intake was such a bad thing for cholesterol levels — and I don’t necessarily think that’s the case — they still have better outcomes than they did when they were giving out the advice to cut fat.

I think a more reasonable explanation for an increase in total cholesterol has more to do with the steady increase in BMI and the increasing age of the cohort. It wouldn’t matter much what you fed them, high fat, low fat, or anything in between, if they were fatter on average their blood lipids would trend negatively. Interestingly, cholesterol can temporarily increase when you start to lose weight, especially when losing visceral fat. If anyone was actually on a low-carb diet — something which is impossible to determine from the data presented in the paper — it’s possible that weight loss is skewing blood lipid data and we could see a downward trend in body fat with a trailing reduction in total cholesterol as the lipids are cleared.

If I were looking to use the information from this study to make recommendations, I would look more at the sources of carbohydrate intake. What you eat, not just how much of it, does matter. The people in the study changed their carbohydrate sources from boiled potatoes and crisp bread (presumably knäckebröd made from rye) to soft bread (made from wheat), rice, and pasta. The total carbohydrate intake is fairly high too, especially considering the source of the carbohydrates is mostly bread and junk food, not vegetables or fruit.

I don’t think the total fat intake is much of an issue, but I do think the fat sources are a problem. Rapeseed oil isn’t a particularly healthy oil, despite lobbyist insistence that it’s perfectly okay. The people in this study are probably consuming the butter/rapeseed blend mentioned in the study out of a misplaced concern to avoid saturated fat from plain butter. They may instead be causing more health issues from the inclusion of an adulterant oil.

The pre-existing biases of the researchers is possibly what leads them to make a recommendation to reduce fat consumption when according to their own data it does not actually seem to be causing problems. Their report shows that the fat intake has only very slightly increased over the beginning 1986 levels, but the cholesterol levels decreased steadily until 2004 and are still lower than the 1990s post-intervention levels, when the medical community was presumably recommending a low-fat diet.