chapterize issues

a minor error in code

2

in chapterize.py line 173 `endLocation = len(self.lines)-1 # The end` I think it's better set as `len(self.lines)` because if we can't detect the end location, the last line could possibly...

eveliao

Chapter title variations not working

1

Hi. I tried using chapterize with a text that has chapter titles in this format : '4. The Black Bird' (With number then title), and chapterize returns the headings <...

green345

Write tests and set up CI

It'd be nice to have automatic tests and hook in Travis or some other CI. But it might be better just to deprecate this version of the tool, anyway, and...

JonathanReeve

Integrate HTML-based chapterizer

I wrote a quick-and-dirty HTML chapterizer that could be integrated with this one: https://github.com/JonathanReeve/chapter-experiments/blob/master/chapterize-html.ipynb

JonathanReeve

It would be nice if this could parse short stories, like this: http://www.gutenberg.org/cache/epub/25519/pg25519.txt Possibly detecting a 'contents' section and getting the titles from there would work, at least for that...

nateGeorge

implement log mode that outputs chapter data instead of actually chapterizing

Just write to log: text name, number of chapters, lengths of each chapter. This will enable studies of lots of texts at a time.

JonathanReeve

use a different word tokenizer that doesn't require external data to be downloaded

The NLTK's word_tokenize function requires Punkt data to be downloaded, which could effectively break the program for those that don't know what's going on.

JonathanReeve

chapterize
chapterize copied to clipboard

Metadata

a minor error in code

Chapter title variations not working

Write tests and set up CI

Integrate HTML-based chapterizer

Parse short stories

implement log mode that outputs chapter data instead of actually chapterizing

use a different word tokenizer that doesn't require external data to be downloaded

← Metadata

Owner

Metadata

chapterize chapterize copied to clipboard

Metadata

a minor error in code

Chapter title variations not working

Write tests and set up CI

Integrate HTML-based chapterizer

Parse short stories

implement log mode that outputs chapter data instead of actually chapterizing

use a different word tokenizer that doesn't require external data to be downloaded

← Metadata

Owner

Metadata

chapterize
chapterize copied to clipboard