whisper.cpp
whisper.cpp copied to clipboard
Create flag to disable normalization (i.e., allow filler words)
I'd like to make a feature request to disable normalization (i.e., allow filler words, such as: "oh", "um", and "eh") in the transcription of whisper.cpp.
This would be of interest to me because I use whisper.cpp to do most of the leg work for podcast transcription, and then spend a lot of time smoothing out the raw output. I find myself adding the filler words back in manually.
(This is a more precise and scoped excerpt of what was first mentioned in issue #660.)
A discussion on Hugging Face indicates that this is possible.
However, this seems to be a only possible to turn on or off this option when writing a full on proper Python script and not as a flag for the Python Whisper CLI command, as indicated by a Stack Overflow question from March 2023.
Sorry to necro a year-old open issue, but I'm having the opposite problem—I get all the filler words and I want to remove them. If you've continued using this, have you found any controls for toggling filler words?