whisper-asr-webservice
whisper-asr-webservice copied to clipboard
uploading certain audio files results in empty transcription
Hello,
I have noticed following behavior that seems unintended:
using v1.1.0, either cpu or gpu:
Uploading an .m4a file with two audio channels results in an empty transcript, no error or anything. This happens when encoding is set to true.
I expected that the file would either be successfully decoded & transcribed or an error is returned. Reencoding to mp3 solves this problem but seems like an unnecessary complication.
request url:
http://localhost:9000/asr?method=faster-whisper&task=transcribe&encode&output=json
response:
{"language": "en", "segments": [], "text": ""}
Tested with base model & large-v2.
using docker logs does not show any errors.
edit:
I remuxed the file with -movflags faststart; and now it works. It seems that the same problem as in #42 is happening.
I can conform that this is still an issue with .m4a, files short ones seem to work but when they get longer than about 20-30 seconds the API responds with and empty string...
Anyways I tested several file types locally on an Nvidia 3060 and here are the results of my tests:
Transcribing 1 minute of speech.
Same here. MP4 and M4A files not working. Tried with both encode=true and encode=false in the request. MP3 files worked fine.
encode=true = Blank response
encode=false = 500 error
I modified the code, fixed the error, and it seems that the run method of ffmpeg returns a tuple instead of a bytes-like type, which causes it to not work.
same problem here, with wav files
the same here
edit:
I remuxed the file with -movflags faststart; and now it works. It seems that the same problem as in https://github.com/ahmetoner/whisper-asr-webservice/issues/42 is happening.
Thanks for the update, works for me