nodejs-whisper icon indicating copy to clipboard operation
nodejs-whisper copied to clipboard

stdout and stderr are mixed up during captioning

Open binarykitchen opened this issue 6 months ago • 5 comments

Hi ya,

When transcribing a WAV file, some of the Whisper progress is still streamed to stderr. I am not sure if we have raised this before? It's polluting the logs ...

It should be stdout. Why print so many parameters for error logs?

Image

@ChetanXpro What do you think? Is this repo still maintained?

binarykitchen avatar May 01 '25 10:05 binarykitchen

@binarykitchen Looking into this, and yes its still maintained, i am just bit busy due to job.

ChetanXpro avatar May 12 '25 15:05 ChetanXpro

hey @binarykitchen this is actually expected behavior from whisper.cpp itself, it outputs transcription to stdout and progress/debug info to stderr (which is standard CLI practice).

ChetanXpro avatar May 25 '25 07:05 ChetanXpro

Hmmm, really?

Since when is it okay to print verbose log entries, such about video encoding, to stderr? Do you have any direct sources or quotes supporting this theory?

Then:

hey @binarykitchen this is actually expected behavior from whisper.cpp itself, it outputs transcription to stdout and progress/debug info to stderr (which is standard CLI practice).

Expected behaviour? Where is that from? Can you share a link? Thanks mate

binarykitchen avatar May 25 '25 10:05 binarykitchen

Some code reference from whisper.cpp CLI. they are logging diagnostic and error info to stderr. you can find many more in that file.

Whisper.cpp:

https://github.com/ggml-org/whisper.cpp/blob/13d92d08ae26031545921243256aaaf0ee057943/examples/cli/cli.cpp#L1126

https://github.com/ggml-org/whisper.cpp/blob/13d92d08ae26031545921243256aaaf0ee057943/examples/cli/cli.cpp#L340

https://github.com/ggml-org/whisper.cpp/blob/13d92d08ae26031545921243256aaaf0ee057943/examples/cli/cli.cpp#L1126

Wikipedia:

https://en.wikipedia.org/wiki/Standard_streams

Standard error is another output stream typically used by programs to output error messages or diagnostics.

Stackoverflow:

https://stackoverflow.com/questions/26130795/when-i-need-to-use-stderr-important-errors-or-all-errors

Image

POSIX specification (IEEE Std 1003.1):

"Standard error is used only for diagnostic messages" https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_401

ChetanXpro avatar May 25 '25 16:05 ChetanXpro

Thanks for the useful links @ChetanXpro - didn't know that stderr can be used for diagnostics only without any errors being printed.

But this Wikipedia article is a bit confusing: https://en.wikipedia.org/wiki/Standard_streams#Standard_error_(stderr)

This solves the semi-predicate problem, allowing output and errors to be distinguished

It says, output and errors shall be distinguished, which isn't the case here. If the error log is full of debug + diagnostic lines, how can you spot, find any errors in it? Isn't that counterproductive, goes against the big idea, to ensure errors are being logged, seen and reported?

It's too noisy for my app.

There is the option no_prints on the whisper side. Can we add support for it on nodejs-whisper? https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L1123

If you like that idea, I'm happy to create a new ticket to implement no_prints

binarykitchen avatar May 25 '25 22:05 binarykitchen