nodejs-whisper
nodejs-whisper copied to clipboard
Says WAV file is valid, then later says it's invalid?
Running your latest version on ArchLinux.
nodejs-whisper says the WAV file is valid, but later the native whisper instance says it's not. Huh?
[dev:server] [Nodejs-whisper] File is a valid WAV file.
And later it says:
[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'
Here are the details from the logs:
[dev:server] DEBUG: »»-----------------------------------------►
[dev:server] [Nodejs-whisper] Checking and downloading model if needed: base
[dev:server] autoDownloadModelName base
[dev:server] options {
[dev:server] modelName: 'base',
[dev:server] autoDownloadModelName: 'base',
[dev:server] verbose: true,
[dev:server] removeWavFileAfterTranscription: false,
[dev:server] whisperOptions: { outputInVtt: true }
[dev:server] }
[dev:server] [Nodejs-whisper] Models already exist. Skipping download.
[dev:server] [Nodejs-whisper] Checking file existence: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Converting file to WAV format: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Checking if the file is a valid WAV: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] File is a valid WAV file.
[dev:server] [Nodejs-whisper] Constructing command for file: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Executing command: ./main -ovtt -l auto -m ./models/ggml-base.bin -f /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] code--- 0
[dev:server] stdout---
[dev:server] stderr--- whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.bin'
[dev:server] whisper_model_load: loading model
[dev:server] whisper_model_load: n_vocab = 51865
[dev:server] whisper_model_load: n_audio_ctx = 1500
[dev:server] whisper_model_load: n_audio_state = 512
[dev:server] whisper_model_load: n_audio_head = 8
[dev:server] whisper_model_load: n_audio_layer = 6
[dev:server] whisper_model_load: n_text_ctx = 448
[dev:server] whisper_model_load: n_text_state = 512
[dev:server] whisper_model_load: n_text_head = 8
[dev:server] whisper_model_load: n_text_layer = 6
[dev:server] whisper_model_load: n_mels = 80
[dev:server] whisper_model_load: ftype = 1
[dev:server] whisper_model_load: qntvr = 0
[dev:server] whisper_model_load: type = 2 (base)
[dev:server] whisper_model_load: adding 1608 extra tokens
[dev:server] whisper_model_load: n_langs = 99
[dev:server] whisper_model_load: CPU total size = 147.37 MB
[dev:server] whisper_model_load: model size = 147.37 MB
[dev:server] whisper_init_state: kv self size = 16.52 MB
[dev:server] whisper_init_state: kv cross size = 18.43 MB
[dev:server] whisper_init_state: compute buffer (conv) = 16.39 MB
[dev:server] whisper_init_state: compute buffer (encode) = 132.07 MB
[dev:server] whisper_init_state: compute buffer (cross) = 4.78 MB
[dev:server] whisper_init_state: compute buffer (decode) = 96.48 MB
[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'
[dev:server]
[dev:server] whisper_print_timings: load time = 306.03 ms
[dev:server] whisper_print_timings: fallbacks = 0 p / 0 h
[dev:server] whisper_print_timings: mel time = 0.00 ms
[dev:server] whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
[dev:server] whisper_print_timings: encode time = 0.00 ms / 1 runs ( 0.00 ms per run)
[dev:server] whisper_print_timings: decode time = 0.00 ms / 1 runs ( 0.00 ms per run)
[dev:server] whisper_print_timings: batchd time = 0.00 ms / 1 runs ( 0.00 ms per run)
[dev:server] whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
[dev:server] whisper_print_timings: total time = 312.29 ms
[dev:server]
[dev:server] stdout---
[dev:server] [Nodejs-whisper] Transcribing Done!
[dev:server] [Nodejs-whisper] Error during processing: Transcription failed or produced no output.
Any ideas what this could be?
Thanks!