whisper.cpp
whisper.cpp copied to clipboard
Added a bool fold_lowercase to whisper_context_params
If true, it folds language-model tokens to lowercase. By default, it's false. This is intended to make grammar matching more predictable, e.g. no need to account for case in the grammar.
I have no idea what's wrong with the Java bindings. I loaded them all into Visual Studio Code and fixed all the errors it reported (which didn't seem related to my changes), but still the Java-related tests fail. FYI, I haven't programmed in Java in over 10 years.
I'm also not good with Java, but I think we are probably observing an issue similar to this one: https://github.com/ggerganov/llama.cpp/pull/1902#issuecomment-1605524391
In short, even though the two structs whisper_context_params (C) and WhisperContextParams (Java) have the same members, there are likely different paddings between the members which is causing UB in the following code:
https://github.com/ggerganov/whisper.cpp/blob/8f253ef3af1c62c04316ba4afa7145fc4d701a8c/bindings/java/src/main/java/io/github/ggerganov/whispercpp/WhisperCpp.java#L70-L81
The proper solution is to order the members in decreasing size (i.e. keep the bools at the end of the struct). Or maybe avoid bool and simply replace them with int - this seems much easier change