whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Weird words not being capitalized - even at start of sentence.

Open janngobble opened this issue 2 years ago • 1 comments

So, I don't know if this is more the trained data set or how whisper.cpp cuts the file up to process - and therefore doesn't realize it's the beginning of a sentence, but certain words that should be capitalized - even when they're not at the beginning of a sentence - are not. I'm using the medium ggml dataset. NOT the medium.en. Language - of course - english.

Miss (beginning of sentence) Mrs. How (beginning of sentence) You (beginning of sentence) You're (beginning of sentence), etc...

If it helps, this is the BluRay of Miss Marple "A Body in the Library."

(weird that it still capitalizes their last names but lowercases their honorifics)

[00:33:33.680 --> 00:33:36.680]   Oh, it's amused you no doubt, calling yourself Miss Lee
[00:33:36.680 --> 00:33:41.920]   miss Lee when you are in fact mrs. Blake and it has helped to keep all but the
[00:33:41.920 --> 00:33:48.320]   most curious at a distance but the time for such games is over.
[00:33:48.320 --> 00:33:50.320]   [singing]
[00:33:50.320 --> 00:33:55.100]   how did you know we were married?
[00:33:55.100 --> 00:33:57.100]   my dear the way you quarrel.
[00:33:57.100 --> 00:34:02.420]   you quarrel like people who are tied to each other by more than a mere love affair you see.
[00:34:02.420 --> 00:34:04.420]   you're astonishing.
[00:34:04.420 --> 00:34:07.420]   I thought you've been to Somerset House or something.
[00:34:07.420 --> 00:34:09.420]   oh
[00:34:09.420 --> 00:34:11.420]   Somerset House.
[00:34:11.420 --> 00:34:19.580]   and the rest of it?

On another episode: This one was on "large" v2 ggml.

[00:01:50.000 --> 00:01:52.000]   Jerry Burton my sister Joanne.
[00:01:52.000 --> 00:01:56.000]   well I never. Maud Colesrop my husband's vicar here.
[00:01:56.000 --> 00:02:01.000]   he'd be expecting to see you in church on Sunday. everyone will. they're all dying to meet you.
[00:02:01.000 --> 00:02:03.000]   oh lord. I suppose we disappoint them.

Is this something that should be dealt with in the dataset - being mis-trained - or just the method whisper.cpp uses (inference?/greedy?)?

janngobble avatar Dec 23 '22 22:12 janngobble

I think the behaviour is related to the model. See https://github.com/openai/whisper/discussions/194 and https://github.com/openai/whisper/discussions/290 for more information

ggerganov avatar Dec 29 '22 11:12 ggerganov