japanese-ebook-analysis
japanese-ebook-analysis copied to clipboard
Analyse sentences as well as words
Currently, we are only analysing individual words. If we also break the book up into sentences, we get access to some useful metrics like average sentence length etc. Two options seem feasible to me:
- Reconstruct the sentences by stringing together individual words until we hit a sentence-ending character like 。
- Maybe mecab has a thing that lets you break text up into sentences rather than words
I think option 1) should be sufficient, as I can't really think of too many edge cases.