Anartigone

Results 4 comments of Anartigone

> > I'm thinking about doing something like `set(list(output_wordboundaries))` and then filtering input on that set. This will allow me to figure out which index of the input I need...

> It's simple. > > ```python > def generate_subs_based_on_punc(self, text) -> str: > PUNCTUATION = [',', '。', '!', '?', ';', > ':', '\n', '“', '”', ',', '!', '\\. '] >...

This is a helpful function to have. I have tested it works in both Chinese and English. I agree it should be merged for a good reason.

I had encountered similar problems with some EPUBs as well. In deed, there is no good solution for such arbitrary html problems. My workaround is to first convert the EPUB...