Results 22 comments of tshatrov

Yeah, this sounds like a good idea. One problem though, in JMdict database hiragana readings are not separated by kanji so this wasn't possible to implement at the time I...

@tslater I think if you do `(setf ichiran:*default-romanization-method* ichiran:*hepburn-basic*)` _before_ building the executable, then `-f` will use basic romanization.

Yeah it doesn't parse proper nouns at all because they aren't in JMdict. There isn't a word フレッド but there is a word レッド. There could be all sorts of...

I decided not to do this because it would likely degrade segmenting a lot. Proper nouns can't be consistently romanized anyway. I'll be adding things that *can* be romanized such...

Well, the main problem is that the [postmodern library](https://github.com/marijnh/Postmodern) which is used to access the database only supports Postgres, and it also happens to be the best db library, and...

Hm, I don't know, depends on how "invasive" it is to the existing codebase. Also it might make adding new features more difficult as I'd have to test if each...

The gloss is available for root words only. it is a list of definitions, each definition is itself a dictionary which has a part of speech (`pos`) and the definition...

There's a [blog post](https://readevalprint.tumblr.com/post/97467849358/who-needs-graph-theory-anyway) about the segmentation algorithm, but the secret sauce is really the scoring algorithm, which was built in an ad-hoc manner over the years to split sentences...

Yes, this is the suite, it's not particularly thorough, I was mostly including corner cases for the segmentations I wanted to fix. https://github.com/tshatrov/ichiran/blob/master/tests.lisp

The punctuation substitutions are listed here: https://github.com/tshatrov/ichiran/blob/master/characters.lisp#L75 Because Japanese texts don't generally use spaces, I just automatically add spaces after relevant punctuations. It's a bit lazy, I guess. Before each...