whisper.cpp
whisper.cpp copied to clipboard
Feature request: Append punctuation in word level timestamps
Hello, I am happy transcription will actually treat , as a word. It would be nice to merge it with the previous word instead.
Whisper just added word level timestamps to the main branch and it has an option that's a bit useful and we could probably add that to whisper.cpp https://github.com/openai/whisper/pull/869/files#diff-f6accbbb4ebcd3dd6815bf012490d9ba37eb89a65f2124adc95c2a39bc6941b7R340
parser.add_argument("--append_punctuations", type=str, default="\"\'.。,,!!??::”)]}、", help="if word_timestamps is True, merge these punctuation symbols with the previous word")
Originally posted by @haikyuu in https://github.com/ggerganov/whisper.cpp/discussions/580#discussioncomment-5237730
The other thing is that I'll do this is transcribed as ["I", "'ll", "do", "this"] so it makes sense to also support appending words that starts with punctuation like 'll onto I