epub_to_audiobook Refactor split_text function to handle Chinese text more effectively

Added regular expression to split Chinese text into sentences based on punctuation marks.
Ensured that each chunk's length is as close as possible to max_chars without splitting sentences abruptly.

May 31 '24 08:05 Glowin

Thanks! Will take a look into this ASAP.

Jun 28 '24 08:06 p0n1

@p0n1 will leave it to you, have no idea about Chinese punctuation

Aug 24 '24 22:08 Bryksin

@p0n1 will leave it to you, have no idea about Chinese punctuation

Got it. I will try it this week.

Aug 26 '24 03:08 p0n1

Good improvements to the Chinese text splitting logic. The regex-based sentence splitting is more appropriate for Chinese language processing. There might be rare edge cases (e.g., very long sentences without punctuation) that could produce unexpected results. I'll test this implementation locally for a while to check for such cases.

Sep 05 '24 08:09 p0n1

I found that this PR doesn't segment mixed Chinese-English text well.

Now we have better sentence segmentation for most languages by https://github.com/p0n1/epub_to_audiobook/pull/131. Appreciate your contribution, closing this PR.

Apr 03 '25 14:04 p0n1