whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Get text divided into paragraphs?

Open sindresorhus opened this issue 1 year ago • 4 comments

This would be useful when transcribing to a text document because having the text divided into paragraphs makes it more readable. This may be outside the scope of this project. Just thought I would ask.

sindresorhus avatar Feb 26 '23 09:02 sindresorhus

Yeah, this sounds like out of scope. Probably there is some 3rd party tool that you can apply on the output of whisper.cpp

ggerganov avatar Feb 27 '23 19:02 ggerganov

@sindresorhus you can try build in javascript segmenter https://www.stefanjudis.com/today-i-learned/how-to-split-javascript-strings-with-intl-segmenter/

abodacs avatar Mar 11 '23 21:03 abodacs

@abodacs The linked segmenter only covers sentences, not paragraphs. I'm already doing sentence segmentation in my app.

sindresorhus avatar Mar 11 '23 23:03 sindresorhus

@sindresorhus Aha, you can check this discussion https://github.com/openai/whisper/discussions/552

abodacs avatar Mar 12 '23 11:03 abodacs