Japanese-Tools icon indicating copy to clipboard operation
Japanese-Tools copied to clipboard

IRC bot and small scripts for learners of the Japanese language

-- coding: utf-8; mode: org; --

#+OPTIONS: ^:{}

These are some scripts that help me learn Japanese.

Most scripts are supposed to be used as plugins for an IRC bot or run on a shell. I find the following aliases quite useful:

#+BEGIN_EXAMPLE alias ja="$JAPANESE_TOOLS/jmdict/jm.sh" alias wa="$JAPANESE_TOOLS/jmdict/wa.sh" alias rtk="$JAPANESE_TOOLS/rtk/rtk.sh" alias gd="$JAPANESE_TOOLS/google_dictionary/gd.sh" #+END_EXAMPLE

I do most of my dictionary lookups with these aliases.

All scripts have only been tested on Ubuntu 12.04 and later. There are a few dependencies not present on a default Ubuntu system. You can install them with #+BEGIN_EXAMPLE $ sudo apt install mecab-jumandic-utf8 mecab kakasi xmlstarlet xsltproc python-irclib sqlite3 bc liburi-perl tesseract-ocr imagemagick #+END_EXAMPLE

  • audio/ find_audio.sh finds an audio version of a given Japanese word on languagepod101. #+BEGIN_EXAMPLE $ ./find_audio.sh 夜空 Audio for 夜空 [よぞら]: http://tinyurl.com/p8aq8jo #+END_EXAMPLE

  • compare_encoding Compares the size of different encodings of the same Japanese Wikipedia article. In almost all cases UTF-8 is smaller than UTF-16. #+BEGIN_EXAMPLE $ ./compare_encoding.sh 夜空 UTF-8 vs. UTF-16: 91213 vs. 156876 bytes. UTF-8 wins by 41.8%. #+END_EXAMPLE

  • gettext/ Internationalization support. Currently supported languages:

    • English
    • German
    • Polish

    Be sure to run gettext/regenerate_mo_files.sh if you would like to use a translation.

  • google_count Counts the number of Google results. Uses google.co.jp for queries containing Japanese characters and google.com otherwise.

  • google_dictionary/ gd.sh looks up English words in the Google dictionary. #+BEGIN_EXAMPLE $ ./gd.sh diligent /ˈdiləjənt/ having or showing care and conscientiousness in one's work or duties #+END_EXAMPLE

    Currently broken because it appears like Google shut down their dictionary JSON API.

  • google_translate/ gt.sh translates words and sentences using Google Translate. The target language is determined by the environment variable LANG, but it can also be specified explicitly. #+BEGIN_EXAMPLE ./gt.sh My hovercraft is full of eels. 私のホバークラフトは鰻がいっぱいです。

./gt.sh it My hovercraft is full of eels. it: Il mio hovercraft è pieno di anguille.

./gt.sh Il mio hovercraft è pieno di anguille. My hovercraft is full of eels. #+END_EXAMPLE Currently broken because Google shut down the translate API.

  • jmdict/ jm.sh provides jmdict lookups and wa.sh wadoku lookups. Works best for Japanese->English (or Japanese->German), not so well for the reverse direction. This is because jmdict is a Japanese English dictionary and not an English Japanese dictionary.

    To start, you first need to run the scripts prepare_jmdict.sh and prepare_wadoku.sh. This will download and process the respective dictionary files.

#+BEGIN_EXAMPLE $ ./jm.sh 村長 村長 [そんちょう] (n), village headman 市長村長選挙 [しちょうそんちょうせんきょ] (n), mayoral election #+END_EXAMPLE

  • kana/ A simple hiragana and katakana trainer. ** Example IRC session #+BEGIN_EXAMPLE <Christoph> !hira help Start with "!hira [count]". Known levels are 0 to 10. To learn more about some level please use "!hira help ". To only see the differences between consecutive levels, please use "!hira helpdiff ". <Christoph> !hira 5 Please write in romaji: す と に ね へ <Christoph> !hira su to ni ne he Perfect! 5 of 5. Statistics for Christoph: 44.64% of 280 characters correct. Please write in romaji: は と ぬ ほ な #+END_EXAMPLE
  • kanjidic/ Implements a lookup in kanjidic: http://www.csse.monash.edu.au/~jwb/kanjidic.html #+BEGIN_EXAMPLE $ ./kanjidic.sh 日本語 日: 4 strokes. ニチ, ジツ, ひ, -び, -か. In names: あ, あき, いる, く, くさ, こう, す, たち, に, にっ, につ, へ {day, sun, Japan, counter for days} 本: 5 strokes. ホン, もと. In names: まと {book, present, main, origin, true, real, counter for long cylindrical things} 語: 14 strokes. ゴ, かた.る, かた.らう {word, speech, language} #+END_EXAMPLE
  • kumitate_quiz/ A quiz asking JLPT style 文の組み立て questions. Only works as an IRC plugin for now. ** Example IRC session #+BEGIN_EXAMPLE <Flamerokz> !kuiz skm2 Please choose [1-4]: 周囲の人たちの _ _ ★ _ と思う。 (1: 協力を 2: 優勝は 3: 無理だった 4: 抜きにしては). <Flamerokz> !kuiz 2 Flamerokz: Correct! (2: 優勝は) #+END_EXAMPLE ** Example question file A question file (a file ending in =.txt= in =kumitate_quiz/questions/=) should contains lines of the following form: #+BEGIN_EXAMPLE 周囲の人たちの _ _ ★ _ と思う。|協力を,優勝は,無理だった,抜きにしては|2 #+END_EXAMPLE
  • lhc This script has nothing to do with Japanese. It OCRs the image on http://op-webtools.web.cern.ch/op-webtools/vistar/vistars.php?usr=LHC1 to provide live statistics of the status of the LHC.
  • reading/ read.py converts kanji to kana using mecab. #+BEGIN_EXAMPLE $ ./read.py 鬱蒼たる樹海の中に舞う人の如き影が在った。 鬱蒼[うっそう]たる 樹海[じゅかい] の 中[なか] に 舞[ま]う 人[じん] の 如[ごと]き 影[かげ] が 在[あ]った 。 #+END_EXAMPLE
  • reading_quiz/ A quiz asking kanji -> kana questions. Only works as an IRC plugin for now. ** Example IRC session #+BEGIN_EXAMPLE <Christoph> !quiz jlpt2 Please read: 発見 <Christoph> !quiz はっけん Christoph: Correct! (はっけん: (n,vs) 1. discovery, 2. detection, 3. finding) #+END_EXAMPLE
  • romaji/ romaji.sh converts kanji and kana to romaji using mecab. #+BEGIN_EXAMPLE $ ./romaji.sh 鬱蒼たる樹海の中に舞う人の如き影が在った。 ussoutaru jukai no naka ni mau jin no gotoki kage ga atta 。 #+END_EXAMPLE
  • rtk/ rtk.sh looks up keywords, kanji and numbers. The keywords and numbers refer to Heisig's amazing book "Remembering the Kanji". #+BEGIN_EXAMPLE $ ./rtk.sh 城壁 #362: castle 城 | #1500: wall 壁

$ ./rtk.sh star #1556: star 星, #237: stare 眺, #1476: starve 餓, #2532: star-anise 樒, #2872: start 孟, #2376: mustard 芥

$ ./rtk.sh 1 2 3 #1: one 一 | #2: two 二 | #3: three 三 #+END_EXAMPLE

  • simple_bot/ As the name says, this is a simple IRC bot. You can start it with: #+BEGIN_EXAMPLE $ ./bot.py <server[:port]> [NickServ password] #+END_EXAMPLE It uses all the other scripts.