Japanese by Example
Learning through examples in manga

Tracking known vocabulary and kanji in manga

I use spreadsheets to track my progess of learning vocabulary and kanji appearing in the manga I read.

This page details how others may use them as well.

Purpose

Sites such as jpdb and Koohi do a good job at providing frequency information and word lists for novels and anime, but I’m unaware of anything for manga.

I developed these spreadsheets with two goals in mind for manga I’m reading or plan to read:

1) Finding the most frequent kanji/vocabulary words I don’t know yet.

2) Viewing my progress in learning kanji/vocabulary words used.

Features

By maintaining a list of known kanji and vocabulary, one can see what percentage of individual and overall kanji/vocabulary they know from a manga series.

Screenshots

Some things this view tells me:

1) Since それでも is 11 volumes long, I’m viewing stats only for kanji that appear at least 11 times across all volumes. Of these, I have successfully learned 100% of these kanji. (Note: I’ve excluded character names and common shogi terms.)

1) I likewise have kanji for the first 11 volumes of コナン. Looking at kanji that appear at least 11 times within these volumes of the series, I’ve learned 84.3% of the unique kanji appearing this many times. When considering multiple appearances of a kanji, I should recognize 96.22% of total kanji appearances.

1) For ふらいんぐうぃっち, I have the first 10 volumes of kanji. Of these, I have 40 more kanji to learn until I’ve learned all the kanji that appear at least 10 times across these volumes.

1) In キョーコ, the next most frequent kanji I don’t know is 徒 (a kanji I really should know by now), which appears 28 times across the seven-volume series.

This sheet for one series I’m reading has filtered out known kanji, as well as kanji that appear in character names.

From here, I can see the next most frequent kanji I need to learn, how frequent they are in the series, and what their level in WaniKani is. (The latter is useful for seeing which kanji I should know, even if I’m sure I’ve never seen them before in my life.)

The next most frequent word for me to learn from ARIA is 昇格. By entering this into the Word field, I see in the Word Frequency column that it appears very infrequently in other series I’m reading.

Looking at the series I may wish to begin to read or try again at reading, the Overall column gives me a good idea of how many vocabulary look-ups I would be doing.

For example, in ぼっち I’d be looking up roughly 1-in-20 words that appear the Min number of times (as I should recognize almost 96%).

ハヤテ on the other hand, I’d likely be looking up closer to 1-in-10 words that appear the Min number of times (as I should recognize about 91%).

Limitations

There are a few technological limitations.

Spreadsheets

How to use

Step-by-step instructions on copying and filling out these sheets:

The two kanji sheets are used together, and the two vocabulary sheets are used together, but the kanji and vocabulary sheets are used independently from one another. The following instructions apply equally to the kanji pair and the vocabulary pair.

Copying the progress spreadsheet

1) Open the Progress spreadsheet (from the list above).

1) Save a copy to your Google Docs by selecting the “File” menu and then “Make a copy”.

Adding a series

1) Open the Series spreadsheet (from the list above).

1) From the Series List tab, locate a series you want to add to your Progress spreadsheet. Click on the link to be taken to the sheet for that series.

1) Right-click on the sheet’s tab and select “Copy to” then “Existing spreadsheet”.

1) Select your copy of the Progress spreadsheet.

1) On the Progress sheet, rename the copied sheet to remove “Copy of” (or 「のコピー」) from the sheet name. You can optionally rename the sheet to anything you wish.

1) Add the series name to the Progress sheet. This must match the sheet name for the series, including the volume number if there is one.

1) Input into the Min column the minimum number of occurrences required to include the kanji/vocabulary word in the progress stats.

Using series data

1) Filter the Known column to show only “FALSE”. Optionally filter the Dictionary column to show only “TRUE”.

2) The kanji/vocabulary word at the top of the list is the most frequent item in the series. You’ll probably want to learn this one next.

3) Add known kanji/vocabulary words to the Known Kanji or Known Words sheet.

4) If you wish to exclude a kanji/vocabulary word from the Progress sheet, add a note to the Notes column of the item you wish to exclude. Reason to do this include:

Note: The kanji or vocabulary words are sorted by frequency by default. If this sorting is ever lost, you can sort the series by the Count column, Z to A, to restore displaying the most frequent words first.

Requesting a series

I can add a series by request provided you are able to complete these steps:

1) Purchase a digital copy of the volumes from the series.

2) Remove DRM and unzip contents.

3) Install Mokuro and run it on the volume folder that contains the manga page images.

4) Compress Mokuro’s output _ocr folder into a zip file.

5) Share the zip file with me on Google Drive, or else send me a message on Discord at ChristopherFritz#5813 with a link to the zip file.

Note: I don’t know Discord very well and I’m only on it a few times a month. I’m not certain if I can actually receive messages from random people.