Estimating difficulty of anime from performance logs

Learning japanese is a long and arduous task, and over the years of practice it can feel like there is no real progress. To prevent myself from being too depressed about wasting hours every day in this task I’m clearly not suited for, it’s become pretty important to me to measure how well I’m doing to see if I’m progressing or not.

To that end, I use the same anime episode that I rewatch months appart and I measure how well I understand it (time for pauses, number of unknown words, etc…). From there I can deduce how much I’ve progressed, and infer a progress per day rate. Over the last few years, it looks like this:

But the progress rate is compound so to look at my actual level, starting at an arbitrary 1 and applying the interest rate the closest, I get something like:

Which immediately brings to mind exponential loads, so I modelled it to get the formula for japanese knowledge:

160 represents my japanese level at the beginning of this ordeal (I’ve been learning for more time than measuring), 500 represents my natural aptitude to learn japanese. Oh and d is the amount of days for which I’ve been studying.

So where does that leave me? It seems pretty clear that I’ve reached the “plateau” and all the progress I might make is going to be very slow. Yippy.

But with that data, I can solve a problem that has long troubled me: if I’m doing better with this anime than the previous one, is it because I’ve progressed or because I’ve picked an easier anime? By dividing the score by my modelized level, I can get some sort of normalized anime difficulty that can compare anime through time!

Well it seems that all this time, I’ve been too close to the plateau for it to make any real difference. It would appear that my progress is super neglectible compared to the difficulty difference between various series.

The positive view would be that hard series are just really hard and easy series are very easy, the negative view is that my progress is damn slow, the very negative view is that it’s slow and plateau-ing. I guess the next step would be to correlate this data with other learners. But at the very least I got a cute equation, so there’s that.

EDIT: To assess this hypothesis I’ve studied a full anime that I had already studied years ago (09/2020 to 11/2022). It would appear that I still made progress, from 27.3 mins per episode to 26.4 mins per episode. It does concur with my estimate of a fragment of a percent of progress per day. However, it did strike me as relatively hard, and on par with the other hard anime I’m currently studying. It would tend to confirm that progress is tiny compared to the difficulty difference between anime ๐Ÿ˜ฆ

Estimating anime difficulty from subtitles

Hi! I have a lot of other things to do so as procrastination I suppose I did a little analysis. I’m learning Japanese by watching raw anime and writing down what I don’t know, so I have a pretty great dataset of what anime did I find actually hard vs easy. For best experience, I’d like to watch them by increasing level of difficulty, but it’s hard to figure out the difficulty before actually watching :/

I had looked at a glance at using subtitles files to do that: the idea was that the bigger the subtitles files, the more text and therefore difficult text there would be, and therefore the hardest it would be to understand. But a quick glance seemed to show no correlation between subs size and difficulty.

So today I thought it wouldnt be too hard to write a little script that goes over subtitles in a folder, remove all junk boilerplate and count the kanjis. The script is here. The upside of this is that I can remove the .srt or .ass boilerplate, but also look at the text to see if it has difficult kanjis.

Without further ado, this is the average kanji distribution per subtitle file for 10 anime annotated with my expert ground truth:

And on a log scale:

So… this doesnt look very helpful. The result of this analysis is a complete failure. JLPT kanji level doesn’t correlate with anime difficulty. But maybe I can pick myself a list of kanjis that are correlated with difficult anime (like military vocabulary, this always gets me). I guess ideally I’d assign a weight to each kanji by machine learning but this requires more effort that im willing to do. If you know a dead easy way to do that, though, I’m interested

Cheers

Endless learning

It is the height of summer. I’m still struggling with learning Japanese, slow and steady, probably forever… But I’m also learning a lot about language learning itself, and it’s no secret that repetition is an important part of the process. And what better example to study than Haruhi Suzumiya’s Endless Eight (for people who don’t know, it’s a series of 8 episode displaying more or less the same events and it’s one of the greatest piece of art in anime history).

When I study Japanese, I write down how many new words I encounter, and how much of the dialog I understand for each episode (one per day). The structure of Endless Eight is a great source to showcase how much repetition helps with understanding.

As an extra datapoint, I’ve also revisited the same episode (6th, so that it’s in the “middle”) to see how much the learning sticks/how fast I forget after a week, month, 3 and 6 months (of continuous unrelated learning practice).

I don’t think this is solid enough to conclude anything, but it displays nicely how repetition yields diminishing return with stronger benefits at the start and how learning fades but a core does stick. I don’t think I was trying to get anything else out of this other than a pretty graph xD

Hidden japanese conjugations

Hi!

I’m no expert at Japanese, though it’s probably what I spend most of my time on, so don’t take this piece like an actual lesson, but I wanted to list somewhere some things that have given me a hard time in Japanese. These are suffixes that I really consider to be conjugations, but that other people apparently don’t. Those in red are especially tricky in my experience because they are extremely common and not really easy to find in textbooks. I’ll keep updating this as I learn.

Root form:

These go after things like ่ฉฑ(ใฏใช), ่ฆ‹(ใฟ)

-ใš/ใฌ: while not doing

ใฆ forms:

These go after things like ใ‚„ใฃใฆใ€ๅ–‹(ใ—ใ‚ƒใน)ใฃใฆใ€้ฃฒ(ใฎ)ใ‚“ใงใ€่ฆ‹(ใฟ)ใฆใ€ๅ‡บ(ใง)ใฆใ€ใ‚ใฃใฆ

-ใฆใฟ(่ฆ‹)ใ‚‹: try to do something
-ใฆใ‚ใ‚‹: something has been done
-ใฆใ„ใ‚‹: is doing something
-ใฆใŠใ: do something with a purpose
-ใฆใ(ๆฅ)ใ‚‹: give something
-ใฆใใ ใ•ใ„:please do
-ใฆใ„(่กŒ)ใ: go something
-ใฆใ‚ใ’ใ‚‹: give something
-ใฆใ‚‚ใ‚‰ใ†: receive something
-ใฆใ—ใพใ†: finish accidentally
-ใฆใ—ใพใ„ใพใ—ใŸ: finish to completion
-ใฆใ„ใพใ™: do something after a motion
-ใฆใ™(ๆธˆ)ใพใ›ใ‚‹: make do with, finish with
-ใฆใชใŠ: consequently, still, even though

Stem form:

These go after things like ใ‚„ใ‚Šใ€่ฉฑ(ใฏใช)ใ—ใ€้ฃฒ(ใฎ)ใฟใ€ๅ–‹(ใ—ใ‚ƒใน)ใ‚Šใ€ใ‚ใ‚Š

-ใ„ใˆใ‚‹: to be possible
-ใ„ใŒใ‚‹: to seem like
-ใ„ใชใŠ(็›ด)ใ™: to again
-ใ„ใ‚(ไผš)ใ†: do to each other
-ใ„ใ(ๅˆ‡)ใ‚‹: do to completion
-ใ„ใจใŠ(้€š)ใ™: do thouroughly
-ใ„ใ“(่พผ)ใ‚€: do for a while, do inwards
-ใ„ใฌ(ๆŠœ)ใ: do an effort to do something to the end
-ใ„ใ (ๅ‡บ)ใ™: start doing
-ใ„ใ™(้Ž)ใ”ใ™: do for too long
-ใ„ใฎใ“(ๆฎ‹)ใ™: to leave out from doing
-ใ„ใจ(ๅ–)ใ‚‹: do again for yourself, take

Under investigation:

-ใ„ใ‹(ๆŽ›)ใ‘ใ‚‹: ??? hanging ??
-ใ„ใ‚(ไธŠ)ใŒใ‚‹: do upwards
-ใ„ใค(ไป˜)ใ‘ใ‚‹: do against
-ใ„ใŸ(็ซ‹)ใฆใ‚‹: stand to do?
-ใ„ใ„(ๅ…ฅ)ใ‚Œใ‚‹: enter to do
-ใ„ใ‹ใˆ(่ฟ”)ใ™: to do in return
-ใ„ใŸใŠ(ๅ€’)ใ™: to kill by doing

Resources for self-teaching Japanese

Hi! So I’ve been self-teaching japanese for a while now and I think some of the resources I’ve built over the years may be of interest for people, so I’ll centralize them here. I will also add a couple of recommendations, but I’ll try to keep it light. I’ll highlight the stuff I produced with blue. Most of these are actively used and worked on every day so you’ll see some traces of my daily regimen, please be lenient ๐Ÿ™‚

Before you start

  • Japanese is probably one of the hardest languages to learn in the world, especially if you come at it from a “western” language (it’s just so different). It is going to take a lot of time, therefore the most important thing is motivation and stamina. It’s a marathon, not a script. Make sure you enjoy it.
  • Don’t expect logic and consistency. This language is an amazing mess built by strata in the most chaotic way possible. You’re better off going into it assuming there’s no one to one mapping between writing, pronunciation and sense, or no reason why a particular character has this or this radical. You basically have to learn all the words by heart.
  • There is pretty few syllables in Japanese compared to most languages, meaning there’s gonna be a lot of homophones, ambiguity, etc… Incidentally that’s why they cannot really get rid of kanjis.
  • Worst, speaking tends to deform the language quite a bit (kinda like French), so a lot of time you’ll hear contractions, accents, etc… that will make it impossible to find the corresponding word/grammar point in dictionaries. To make things worst, it’s especially true for the beginner materials: everything tailored towards children tends to use “baby talk” and therefore not the correct pronunciation of words. yay.
  • I have the opposite of “facility” towards this language, your experience will probably be smoother than mine xD

The beginnings

The beginnings are nice because there’s a lot of free content for it, so don’t pass this chance! It’s the time where you can learn with games, on phone or computer. Sadly I missed out on most of it so I don’t have more precise recommendations xD

You probably want to start by learning the alphabets and then some basic grammar. I highly recommend Tae Kim’s guide to learning Japanese, which is one of the best things I’ve seen online:

http://www.guidetojapanese.org/learn/

Otherwise the NHK has also nice resources or news in easy japanese.

On YouTube there’s a lot of stuff. Some i like are JapanesePod101 or Name Ohara.

I also like fluentu.

Immersion

What you want is also to listen to a lot of content in Japanese. Fortunately, this is the age of the internet, and even if it’s not as open as it used to be, we’ve never had so much media content.

Here’s a list of some anime I found easier to understand: myanimelist. This guide is amazing and a little more thorough.

Watching things in Japanese with japanese subtitles is ideal of course, but it’s pretty hard to find. Netflix is one of the rare platforms that does it pretty consistently. A lot of people on YouTube like to embed some or all of what is said on the screen, so that’s something. There’s a few people who aggregate subtitles.

The great thing about having subtitles is that it makes it super easy to note down what you don’t know for review later. For that, you definitely want to use Anki, the de facto standard in flashcards, which means there’s a lot of add-ons, support, etc… There’s a lot of premade decks, but I think it’s also nice to make your own vocabulary cards.

This allows you nice automated setups. Matt, a pioneer of the Mass Immersion Approach (do check it out it’s so great) made a great tutorial about his setup. If you’re more into software than streaming, there is approaches like this which can dig into your softwares to find the text in it and extract it (probably a bit more advanced, but less Netflix-centric).

Matt makes his flashcards himself, even with his automated setup. I made an Anki addon to make cards for me. I only give it a list of words and it adds them to my Anki. Pretty convenient: 

https://github.com/yo252yo/anki_addon

Reading

Don’t worry if you don’t have access to Japanese literature, the internet is your playground for reading material. This chrome extension fetches reading and definition of kanjis you highlight, this one adds furigana to any existing page. Karaokes on YouTube or niconico are great, there’s game scripts that can be fun too.

My favorite dictionary is www.jisho.org.

Outside of chrome, this little program does pretty decent kanji OCR: https://www.kanjitomo.net/

It you ever go to Japan, you can buy books for very cheap at Book-off.

Kanjis

So here’s the big one, how do you learn by heart 2000 symbols that have several meanings, pronunciations… and where visual similarity or construction doesn’t mean anything XD I struggle. I’d recommend to forget about kunyomi, onyomi, etc… and just learn all possible pronunciations because it’s just too messy. And that’s not even going into proper nouns…

About the rythm: one kanji per day is probably ideal, I know it means the language will take you years, but it will take you years so you might as well really master the kanjis instead of plowing through.

Anyway my favorite kanji dictionary is

http://kanjidamage.com/

because it’s low key. It does a pretty decent job at explaining the kanji decomposition and coming up with a good order to learn them, but I was still unsatisfied, so I made my own learning order, based on frequency of use in newspapers, JLPT level, grade it’s taught in Japan, and frequency of appearance in K-ON. But most importantly I’ve been really thorough with the decomposition of each kanji in subcomponents, which is rarely well done. So please enjoy my work (and note that it grows every day as I’m still learning):

https://docs.google.com/spreadsheets/d/1xyXL5PGTH01B3c1IiMl-4MIkcRDFA8Xj-wnn7PLXB_g/edit?usp=sharing

More importantly, this also contains for each kanji all the other kanjis that are similar to it, visually or semantically. This is a great resource that doesn’t exist anywhere else and which you’ll appreciate if you’re like me and keep getting mixed up. It’s made mostly from personal experience, with the help of this kanji similarity graph project.

Finally, since I kept mixing up kanjis, I thought I’d try to leverage my spacial brain and try to make some kind of kanji maps using graphviz. I ended up making several versions of the maps, you can find the code at

https://github.com/yo252yo/kanjigraphs

and here’s an example of what it looks like (highlighting the stuff I need to pay attention to):

Image

Advanced

Once you have a basic understanding of Japanese, you can start to go deeper. My expertise sort of ends here, but I want to point out a couple of things:

Advanced grammar is often presented as “grammar points“, which I think is super great (think “one point per day” for instance). I’m aggregating in this spreadsheet that I’m using for learning grammar points from japanesetest4you.com, japanese-teacher.tanosuke.com, nihongokyoshi-net.com.

At this point, you’re also probably realizing that you’re gonna have to learn proper nouns, and that means even more ways to read kanjis. It consists in pretty much memorizing all the common proper nouns patterns. I gathered the most common first/last names, but also all the important geographic/historic/mythological/cultural names in the following spreadsheet that I’m still actively working on: https://docs.google.com/spreadsheets/d/1V6rQCtsDtI4uhU1TAcYIh-LQpeJ3ipOWJiM8Y73bYjY/edit#gid=420678685

I hope that it will contain in the end everything you need to understand references/private jokes in conversations, like the ads that everyone have seen, etc…

I also use this anime character database to try and see what nouns kanjis are frequently part of.

How to plan a trip to Japan

Everytime I go to Japan I find myself struggling because there is so much to do and see. This will act as a collection of useful links/tips/etc that I’ll update when needed for me to reuse and you guys to use if you want.

The essentials:

To find where to stay, think ahead of what your nightlife will be like, there’s no night transportation! Consider sleeping close to the clubs you’re going so you can walk back whenever.

Maps:

Check japanese movies (for instance)

Check possibleย concerts:

Vocaloid (list, magical mirai), Kalafina, Yuki Kajiura, Flow, Hitorie, Asian Kung Fu Generation, Funkist, Hana Kana, SID, Hyadain, IOSYS, Kyary Pamyu Pamyu, monaca, supercell, egoist, mucc,ย ryo,ย jin,ย zun, Bradio, fripSide, Garnidella, JAM project, Lin Tosite Sigure, man with a mission, trustrick, nagaredap, Wagakki band,ย ย Frederic, Kana boon,ย Unison square garden, YOUR SONG IS GOOD, Oresama,ย TECHNOBOYS PULCRAFT GREEN-FUND.ย STEREO DIVE FOUNDATION,ย Eve, Mafumafu, After the rain, Jagmo, go go densha, minichestra,ย musicals tickets, kamiboku, ็Šฌใ‚‚้ฃŸใ‚ใญใ‡ใ‚ˆ, keytalk

Check clubbing: Anievez, twipla,ย Mograย (DJ list)

Check events:

Check temporary cafes: (need to book in advance ???)

COLLABO-CAFE

animate cafes (several locations),ย anime plaza, amnibus

the guest, royal host,ย nicocafe,ย utoftable cafe, shakeys pizza

noitamina cafe, shirokuma cafe

graffart cafe, 46 shokubo,

karaoke no tetsujin, animanga zingaro

oedo onsen, namjatown

hotel

Check temporary shops:ย parco ikebukuro, shibuya marui, shinjuku wald9

Check temporary exhibits:ย TokyoOtakuMode,ย TOKYO OTAKU CALENDAR,ย source

Food: important, don’t forget toย plan. Make a list of what you absolutely want to do. Check in advance where are great lunch deals, especially for quality beef.

Buying services:

Temple stay/shukubo?

Don’t ever do karaoke without sakenomihoudai

Dont forget to place a pick-up order for bookoff 4 days before your trip ๐Ÿ™‚

ใ‚ณใƒณใ‚ปใƒ—ใƒˆ

Why Japanese is amazingly suboptimal

More often than not I find myself wondering why I’m learning Japanese. Apart from karaokes, one of the reasons is I deeply enjoy how fundamentally weirdly conceived it is. Let me show you by first taking you through an historical journey… It will be more verbose than my usual posts, but also more informative about everything that’s cool in the japanese language. Also I’m still learning so everything below is 1. simplified, 2. only based on a small part of the language, 3. error prone but I think I’m mostly right. tumblr_static_tumblr_ml3snvlooa1rpe562o1_500[1]

A brief history of written text

Long ago, Japan didn’t have a writing system. They saw that China was doing well with their and they thought “gotta get us some of these”. So they did the most natural thing of all and kindly borrowed the chinese writing system.  However, a problem arose when spoken japanese and spoken chinese had always been very different all along. But more was needed to discourage the fierce theft,,, copy… inspiration attempt by the brave japanese people who really wanted to improve to the level of their onii-san. So the order of words in a sentence is different ? Who cares. we’ll just put small numbers next to the chinese text so that we know in what order to read it in japanese ! Thusly establishing a correspondence between japanese spoken words and chinese written words. Apparently that’s what happened for a while and everything waskinda good-ish I guess but someone somehow thought this system may need a little bit of improving. And seeing how the japanese language has conjugation and stuff it would be nice to have a small set of symbols that we could for that. So they came up with the idea of selecting a symbol per sound (syllable). And since chinese characters were a little bit complicated and they needed to write fast, they decided to simplify them. But hey, you have several ways to simplify stuff. The two main ways were to either remove a couple of pen strokes (origin of katakana), or go for a whole stylish redesign with simpler strokes (origin of hiragana). For instance, the symbol ไน… chosen for the sound ku could be either ใ‚ฏ or ใ. Unable to chose between those two brillant ideas apparently, they kinda kept both (in addition to the chinese characters, the kanjis). That resulted in a gigantic “3-alphabet” mess, apparently hiragana were originally used by male scholars and katakana by women, but I guess it was too sexist ? so it ended up being hiragana for conjugations and particles and katakana for words of foreign origins. Eventually they decided to write words in the right order and everything was as fine as fine could be.

From Chinese you say

But the thing about chinese is it’s totally different from Japanese, and a lot of chinese pronunciation nuances were lost on japanese readers. Therefore, the japanese language is full of words that sound the same but are spelled differently. Also every character can be pronounced at least two different ways, right ? I mean the chinese way (onyomi) and the japanese way (kunyomi). And compound words create a lot of new prononciations, so the japanese language is also full of words that writes the same but sounds differently. what And of course you can mix and match this with the meanings, so that one word can very well be written in different fashions and pronounced the same, or written the same and pronounced differently. tumblr_mhea8nnZpL1ql3f96o9_250 But since a short example is worth a thousand words let’s consider this guy : ๆ—ฅ hi, a day or the sun, according to the context. Turns out it can be pronounced nichi in compound words, like ๆ—ฅๆ›œๆ—ฅ nichiyoubi (sunday, where you can see it’s pronounced differently at the beginning and at the end).  Apparently it can be pronounced jitsu too. Turns out you use that word too to count the days of the month, and in that case it can be pronounced either nichi or ka (if the number is < 10 or ends by a 4. The first day of the month being an exception). And you use it to count the days of the week in which case it’s bi in youbi (see above). And then just for the lulz if you put clear + day you get tomorrow written ๆ˜Žๆ—ฅ pronounced ashita (or asu apparently but it’s written the same way and means the same things so I guess who cars ??), or you use ๆ˜จๆ—ฅ pronounced kinou to say yesterday. But at least there are not really any other way to write day, contrary to person, which you could write ไบบ pronounced mostly hito or jin or ๆ–น pronounced kata (but which can also mean direction in which case it’s pronounced either kata or hou… but I think I’ve made my point by now.

tumblr_m5s4in4heW1rtcfaqo1_500

And then came the westerners

At this point I feel obligated to drop a word about the influence of the whole western culture on Japan. Let’s skip over the fact that half the japanese vocabulary is actually english butchered because japanese language doesn’t have consonants =p I just wanted to highlight a couple of brillant cultural heritages that were forced upon the japanese people, like a brand new fourth alphabet (because why stop at 3, come on people) that is rarely used except to write the name of the culprit on the bathroom wall (dangan ronpa spoilerts) or insert X into anagrams to create mysterious names.ใ€€But our most remarkable contribution to the Japanese language, to me, is the fact that they write their numbers in packets of 3, but the language works with 4. To clarify, they write 100 000 is for us 100 thousands, but for them it is 10 “man”. And that’s the reason why you will never be able to read a big number in Japanese.

Can I call you Sasuke-chan ?

Oh yeah just dropping a word about why the Japanese always make a big fuss in the anime about how they call each other, and why the transition to calling by last name to calling by first name is important. It’s because they kinda don’t have a definitive proper word to say “you”, so they end up using each other’s name quite a lot.

Other perks

I focused mainly on the historical oddities of the Japanese language, but there are fortunately billion of cool insane oddities inside the spoken language by itself. Most words correspond to a particular register of language, and sometimes the polite version of it is a totally different word (itadaku for instance is a more polite verb for receiving (morau) or eating (taberu)). Also, when you are polite, you have to use humble lowering vocabulary speaking about you and polite praising vocabulary speaking about the person you’re talking to. That makes another set of words : you can only say furansu-jin (inhabitant of France) about yourself, you have to use furansu no kata for someone you’re talking to. delightful. I also really love the fact that you have expressions, words and particules that sound more manly or feminine, and you even have particules designed specifically to sound more manly/assertive (put a “na” at the end of a sentence to sound moar manly. go ahead try it).  So yeah that’s another variation.

Oh and also the numbers are kinda different according to what they quantify. Like… 3 cylindrical things is sanbon when 3 flat things is sanmai /o/

Apart from all that Japanese is a very lovely language structured very differently from ours which reminds me a little bit of state machines and painting. As in a lot of stuff are implied by the context and the previous sentences so you can just use a little dot of paint, a few words, to convey a lot of stuff.

In conclusion

Wakarimasen_by_speardevil[1] Part of the fun of the japanese language is its weirdness and oddities. I believe that if you were to build from scratch a whole new language trying to make it suboptimal, you couldn’t do a better job. And nothing beats the satisfaction to discover that husband is written with the chinese characters for master, or that rape is written with three times the character for woman, or that eakon is actually translation and short for “air conditionner”, or bilu short for “building”… There is a deep satisfaction to jump into a language in which, when you ask a native speaker what a sentence means, “I don’t know” is a perfectly valid answer. Go for it. Also, karaoke.

In retrospect

I will probably get back to this article at some point to expand on this section because as you learn more and more vocabulary you realize that they have tons of word for the same concept, really precise words for weird concepts, and that most complex words use kanjis that are never used by themselves. Like ๅœ่ปŠ which is a fairly common word for a train stop, is written like this, derived from the verb to stop, ๆญขใพใ‚‹/ใจใพใ‚‹, for which one of the deprecated never used anymore writings used to be ๅœใพใ‚‹, so you end up with a bunch of derived words using ๅœ, another bunch using ๆญข and only your eyes to cry about it. So you end up with words whose substrings are outdated words that dont exist anymore but correspond to other forms that do still exist. Anyway the party is far from over, I’ll rewrite that later. ^^