whisper icon indicating copy to clipboard operation
whisper copied to clipboard

Added Unicode not supported Msg for environments that don't support it

Open alaamroue opened this issue 2 years ago • 1 comments

To reproduce error:

command_list= ['whisper', 'a.ts', '--language', 'Arabic', '--model', 'small']
speechProccess = subprocess.run(command_list, capture_output = True, text = True)
print(speechProccess.stdout)
print(speechProccess.stderr)

The following will be outputted:

Skipping a.ts due to UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-29: character maps to <undefined>

Traceback (most recent call last):
  File "C:\Users\abaghdad\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\transcribe.py", line 478, in cli
    result = transcribe(model, audio_path, temperature=temperature, **args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\abaghdad\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\transcribe.py", line 349, in transcribe
    print(make_safe(line))
  File "C:\Users\abaghdad\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-29: character maps to <undefined>

The modified code adds a try catch over the output so that if it couldn't be converted to Unicode, the model continues to run without returning back to the main function

alaamroue avatar Dec 29 '23 04:12 alaamroue

Is there a plan for this fix to make it into the main version or is there another way to fix the issue?

whisper v20240930 was failing for me with the following error and the change proposed in this PR fixed the issue.

due to UnicodeEncodeError: 'charmap' codec can't encode characters in position 27-29: character maps to

dmitrivi avatar Nov 02 '24 10:11 dmitrivi