AutoSub
AutoSub copied to clipboard
force use utf-8 open README.md
Otherwise encounter error
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Aqaao\AppData\Local\Temp\pip-req-build-2dcr43hl\setup.py", line 8, in <module>
README = fh.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x90 in position 757: illegal multibyte sequence
and here, "non-utf-8" codec error
raceback (most recent call last):
File "autosub/main.py", line 170, in <module>
main()
File "autosub/main.py", line 161, in main
ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration)
File "autosub/main.py", line 69, in ds_process_audio
write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues)
File "C:\env\python-venv\deepspeech\lib\site-packages\autosub\writeToFile.py", line 43, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\udce9' in position 0: illegal multibyte sequence
—————————— edit:"utf-8" codec error too, idk why.
raceback (most recent call last):
File "autosub/main.py", line 170, in <module>
main()
File "autosub/main.py", line 161, in main
ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration)
File "autosub/main.py", line 69, in ds_process_audio
write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues)
File "C:\env\python-venv\deepspeech\lib\site-packages\autosub\writeToFile.py", line 43, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-262: surrogates not allowed
Weird. Which language is your audio in?
Weird. Which language is your audio in?
mandarin, I found many people have the same problem in python. https://github.com/mozilla/DeepSpeech/issues/3557 but i didn't find a solution
Aah yes. You'll need to add .decode('utf-8', 'ignore')
and .encode(...)
while writing to file/saving
Aah yes. You'll need to add
.decode('utf-8', 'ignore')
and.encode(...)
while writing to file/saving
thk, it worked.
https://github.com/abhirooptalasila/AutoSub/blob/5dc2314dea2f7ffc86e1454cb2ef29c6721d4e55/autosub/writeToFile.py#L43
file_handle.write(inferred_text.decode('utf-8', 'ignore').encode('utf-8') + "\n\n")
https://github.com/abhirooptalasila/AutoSub/blob/5dc2314dea2f7ffc86e1454cb2ef29c6721d4e55/autosub/main.py#L140
output_file_handle_dict[format] = open(output_filename, "w", encoding='utf-8', errors='surrogateescape')