milleniumbug comments

Results 38 comments of


                                            milleniumbug

Auto-update dictionaries

There is an autoupdater as a separate application, it needs more work tho.

JGram integration

Partially implemented, not all entries are there, need to handle split queries.

Alternative methods of indexing

Currently using permuterm indexes for queries, but they are *huge*, with the Tanaka Corpus blowing from 31MB to 255MB. JESC would probably blow up to 2GBs.

Alternative methods of indexing

Half solved with DidacticalEnigma.Mem, we could do indexing by words for locally stored corpora.

Project menu

Custom project provides its custom notes too.

Allow usage of different morphological analyzers

http://www.cl.cs.okayama-u.ac.jp/study/project/asa/ ?

Look up sentences that selected text appears in

It is worth to introduce more data sources here: - https://japanese.stackexchange.com/ - http://maggiesensei.com/ - https://www.japanesewithanime.com/ - http://www.sf.airnet.ne.jp/~ts/japanese/index.html - http://www.sf.airnet.ne.jp/~ts/japanese/message/

Look up sentences that selected text appears in

This can be generalized into an idea of "data sources", and modularity.

Look up sentences that selected text appears in

Use Tatoeba project instead of Tanaka.

Add osx-arm64/native

Not much bloat involved to be honest, especially since users of the library can use appropriate options to not ship libraries for platforms they don't support; The biggest issue that...