Anartigone
Anartigone
> > I'm thinking about doing something like `set(list(output_wordboundaries))` and then filtering input on that set. This will allow me to figure out which index of the input I need...
> It's simple. > > ```python > def generate_subs_based_on_punc(self, text) -> str: > PUNCTUATION = [',', '。', '!', '?', ';', > ':', '\n', '“', '”', ',', '!', '\\. '] >...
This is a helpful function to have. I have tested it works in both Chinese and English. I agree it should be merged for a good reason.
I had encountered similar problems with some EPUBs as well. In deed, there is no good solution for such arbitrary html problems. My workaround is to first convert the EPUB...