nodejs-whisper
nodejs-whisper copied to clipboard
stdout and stderr are mixed up during captioning
Hi ya,
When transcribing a WAV file, some of the Whisper progress is still streamed to stderr. I am not sure if we have raised this before? It's polluting the logs ...
It should be stdout. Why print so many parameters for error logs?
@ChetanXpro What do you think? Is this repo still maintained?
@binarykitchen Looking into this, and yes its still maintained, i am just bit busy due to job.
hey @binarykitchen this is actually expected behavior from whisper.cpp itself, it outputs transcription to stdout and progress/debug info to stderr (which is standard CLI practice).
Hmmm, really?
Since when is it okay to print verbose log entries, such about video encoding, to stderr? Do you have any direct sources or quotes supporting this theory?
Then:
hey @binarykitchen this is actually expected behavior from whisper.cpp itself, it outputs transcription to stdout and progress/debug info to stderr (which is standard CLI practice).
Expected behaviour? Where is that from? Can you share a link? Thanks mate
Some code reference from whisper.cpp CLI. they are logging diagnostic and error info to stderr. you can find many more in that file.
Whisper.cpp:
https://github.com/ggml-org/whisper.cpp/blob/13d92d08ae26031545921243256aaaf0ee057943/examples/cli/cli.cpp#L1126
https://github.com/ggml-org/whisper.cpp/blob/13d92d08ae26031545921243256aaaf0ee057943/examples/cli/cli.cpp#L340
https://github.com/ggml-org/whisper.cpp/blob/13d92d08ae26031545921243256aaaf0ee057943/examples/cli/cli.cpp#L1126
Wikipedia:
https://en.wikipedia.org/wiki/Standard_streams
Standard error is another output stream typically used by programs to output error messages or diagnostics.
Stackoverflow:
https://stackoverflow.com/questions/26130795/when-i-need-to-use-stderr-important-errors-or-all-errors
POSIX specification (IEEE Std 1003.1):
"Standard error is used only for diagnostic messages" https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_401
Thanks for the useful links @ChetanXpro - didn't know that stderr can be used for diagnostics only without any errors being printed.
But this Wikipedia article is a bit confusing: https://en.wikipedia.org/wiki/Standard_streams#Standard_error_(stderr)
This solves the semi-predicate problem, allowing output and errors to be distinguished
It says, output and errors shall be distinguished, which isn't the case here. If the error log is full of debug + diagnostic lines, how can you spot, find any errors in it? Isn't that counterproductive, goes against the big idea, to ensure errors are being logged, seen and reported?
It's too noisy for my app.
There is the option no_prints on the whisper side. Can we add support for it on nodejs-whisper?
https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L1123
If you like that idea, I'm happy to create a new ticket to implement no_prints