whisper
whisper copied to clipboard
Added Unicode not supported Msg for environments that don't support it
To reproduce error:
command_list= ['whisper', 'a.ts', '--language', 'Arabic', '--model', 'small']
speechProccess = subprocess.run(command_list, capture_output = True, text = True)
print(speechProccess.stdout)
print(speechProccess.stderr)
The following will be outputted:
Skipping a.ts due to UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-29: character maps to <undefined>
Traceback (most recent call last):
File "C:\Users\abaghdad\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\transcribe.py", line 478, in cli
result = transcribe(model, audio_path, temperature=temperature, **args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\abaghdad\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\transcribe.py", line 349, in transcribe
print(make_safe(line))
File "C:\Users\abaghdad\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-29: character maps to <undefined>
The modified code adds a try catch over the output so that if it couldn't be converted to Unicode, the model continues to run without returning back to the main function
Is there a plan for this fix to make it into the main version or is there another way to fix the issue?
whisper v20240930 was failing for me with the following error and the change proposed in this PR fixed the issue.
due to UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-29: character maps to