whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Feature request: Append punctuation in word level timestamps

Open haikyuu opened this issue 2 years ago • 2 comments

Hello, I am happy transcription will actually treat , as a word. It would be nice to merge it with the previous word instead.

Whisper just added word level timestamps to the main branch and it has an option that's a bit useful and we could probably add that to whisper.cpp https://github.com/openai/whisper/pull/869/files#diff-f6accbbb4ebcd3dd6815bf012490d9ba37eb89a65f2124adc95c2a39bc6941b7R340

parser.add_argument("--append_punctuations", type=str, default="\"\'.。,,!!??::”)]}、", help="if word_timestamps is True, merge these punctuation symbols with the previous word")

Originally posted by @haikyuu in https://github.com/ggerganov/whisper.cpp/discussions/580#discussioncomment-5237730

haikyuu avatar Mar 09 '23 09:03 haikyuu

The other thing is that I'll do this is transcribed as ["I", "'ll", "do", "this"] so it makes sense to also support appending words that starts with punctuation like 'll onto I

haikyuu avatar Mar 09 '23 09:03 haikyuu