milleniumbug
milleniumbug
There is an autoupdater as a separate application, it needs more work tho.
Partially implemented, not all entries are there, need to handle split queries.
Currently using permuterm indexes for queries, but they are *huge*, with the Tanaka Corpus blowing from 31MB to 255MB. JESC would probably blow up to 2GBs.
Half solved with DidacticalEnigma.Mem, we could do indexing by words for locally stored corpora.
Custom project provides its custom notes too.
http://www.cl.cs.okayama-u.ac.jp/study/project/asa/ ?
It is worth to introduce more data sources here: - https://japanese.stackexchange.com/ - http://maggiesensei.com/ - https://www.japanesewithanime.com/ - http://www.sf.airnet.ne.jp/~ts/japanese/index.html - http://www.sf.airnet.ne.jp/~ts/japanese/message/
This can be generalized into an idea of "data sources", and modularity.
Use Tatoeba project instead of Tanaka.
Not much bloat involved to be honest, especially since users of the library can use appropriate options to not ship libraries for platforms they don't support; The biggest issue that...