nodejs-whisper icon indicating copy to clipboard operation
nodejs-whisper copied to clipboard

Says WAV file is valid, then later says it's invalid?

Open binarykitchen opened this issue 1 year ago • 2 comments

Running your latest version on ArchLinux.

nodejs-whisper says the WAV file is valid, but later the native whisper instance says it's not. Huh?

[dev:server] [Nodejs-whisper] File is a valid WAV file.

And later it says:

[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'

Here are the details from the logs:

[dev:server] DEBUG: »»-----------------------------------------►
[dev:server] [Nodejs-whisper] Checking and downloading model if needed: base
[dev:server] autoDownloadModelName base
[dev:server] options {
[dev:server]   modelName: 'base',
[dev:server]   autoDownloadModelName: 'base',
[dev:server]   verbose: true,
[dev:server]   removeWavFileAfterTranscription: false,
[dev:server]   whisperOptions: { outputInVtt: true }
[dev:server] }
[dev:server] [Nodejs-whisper] Models already exist. Skipping download.
[dev:server] [Nodejs-whisper] Checking file existence: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Converting file to WAV format: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Checking if the file is a valid WAV: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] File is a valid WAV file.
[dev:server] [Nodejs-whisper] Constructing command for file: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Executing command: ./main  -ovtt -l auto -m ./models/ggml-base.bin  -f /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] code--- 0
[dev:server] stdout--- 
[dev:server] stderr--- whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.bin'
[dev:server] whisper_model_load: loading model
[dev:server] whisper_model_load: n_vocab       = 51865
[dev:server] whisper_model_load: n_audio_ctx   = 1500
[dev:server] whisper_model_load: n_audio_state = 512
[dev:server] whisper_model_load: n_audio_head  = 8
[dev:server] whisper_model_load: n_audio_layer = 6
[dev:server] whisper_model_load: n_text_ctx    = 448
[dev:server] whisper_model_load: n_text_state  = 512
[dev:server] whisper_model_load: n_text_head   = 8
[dev:server] whisper_model_load: n_text_layer  = 6
[dev:server] whisper_model_load: n_mels        = 80
[dev:server] whisper_model_load: ftype         = 1
[dev:server] whisper_model_load: qntvr         = 0
[dev:server] whisper_model_load: type          = 2 (base)
[dev:server] whisper_model_load: adding 1608 extra tokens
[dev:server] whisper_model_load: n_langs       = 99
[dev:server] whisper_model_load:      CPU total size =   147.37 MB
[dev:server] whisper_model_load: model size    =  147.37 MB
[dev:server] whisper_init_state: kv self size  =   16.52 MB
[dev:server] whisper_init_state: kv cross size =   18.43 MB
[dev:server] whisper_init_state: compute buffer (conv)   =   16.39 MB
[dev:server] whisper_init_state: compute buffer (encode) =  132.07 MB
[dev:server] whisper_init_state: compute buffer (cross)  =    4.78 MB
[dev:server] whisper_init_state: compute buffer (decode) =   96.48 MB
[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'
[dev:server] 
[dev:server] whisper_print_timings:     load time =   306.03 ms
[dev:server] whisper_print_timings:     fallbacks =   0 p /   0 h
[dev:server] whisper_print_timings:      mel time =     0.00 ms
[dev:server] whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   encode time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   batchd time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:    total time =   312.29 ms
[dev:server] 
[dev:server] stdout--- 
[dev:server] [Nodejs-whisper] Transcribing Done!
[dev:server] [Nodejs-whisper] Error during processing: Transcription failed or produced no output.

Any ideas what this could be?

Thanks!

binarykitchen avatar Sep 25 '24 02:09 binarykitchen

I think it's because the input sample rate is at 48kHz, while whisper expects it to be at 16 kHz. That said, you should also check the sample rate.

binarykitchen avatar Sep 26 '24 07:09 binarykitchen

Yeah i think its due to sample rate, i will look into this issue

ChetanXpro avatar Sep 26 '24 16:09 ChetanXpro

Push :D

PheysX avatar May 22 '25 13:05 PheysX