japanese-ebook-analysis icon indicating copy to clipboard operation
japanese-ebook-analysis copied to clipboard

Analyse sentences as well as words

Open christofferaakre opened this issue 3 years ago • 0 comments

Currently, we are only analysing individual words. If we also break the book up into sentences, we get access to some useful metrics like average sentence length etc. Two options seem feasible to me:

  1. Reconstruct the sentences by stringing together individual words until we hit a sentence-ending character like 。
  2. Maybe mecab has a thing that lets you break text up into sentences rather than words

I think option 1) should be sufficient, as I can't really think of too many edge cases.

christofferaakre avatar Nov 24 '21 14:11 christofferaakre