transcribe When using "--language=zh", which is Chinese, it will fail due to the encoding

When using "--language=zh", which is Chinese, it will fail due to the encoding

Open yzcj105 opened this issue 4 years ago • 1 comments

Traceback (most recent call last): File "/usr/local/bin/transcribe", line 8, in sys.exit(console()) File "/usr/local/lib/python2.7/site-packages/transcribe/main.py", line 69, in console exit(name) File "/usr/local/lib/python2.7/site-packages/captain/init.py", line 41, in exit ret_code = s.run(raw_args) File "/usr/local/lib/python2.7/site-packages/captain/init.py", line 176, in run ret_code = callback(*args, **kwargs) File "/usr/local/lib/python2.7/site-packages/transcribe/main.py", line 63, in main_speech for time, text in f: File "/usr/local/lib/python2.7/site-packages/transcribe/speech.py", line 160, in iter text = String(alternative.transcript).flow() UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-43: ordinal not in range(128)

Feb 09 '20 21:02 yzcj105

Not sure how to figure this one out, I googled "chinese characters" and tried on the REPL to see if I could easily reproduce the problem, but it worked as expected:

Python 2.7.15 (default, Jul 23 2018, 21:27:06)
>>> from transcribe.utils import String
>>>
>>> s = String("读写汉字 - 学中文")
>>> s
u'\u8bfb\u5199\u6c49\u5b57 - \u5b66\u4e2d\u6587'
>>> s = String(b"读写汉字 - 学中文")
>>> s
u'\u8bfb\u5199\u6c49\u5b57 - \u5b66\u4e2d\u6587'
>>>

And didn't have any issues with conversion.

What version of python?

You could try running it with:

$ transcribe --quiet=+d speech ...

And see if it gives you any useful debug information that could help us.

If you could send me a small audio clip that fails and the command you ran I could probably debug it from that also.

If you are so inclined you can get into the code also, the failing method is right here. You could print out the string and send that to me also, or fix it and submit a pull request.

Feb 11 '20 00:02 Jaymon

transcribe transcribe copied to clipboard

When using "--language=zh", which is Chinese, it will fail due to the encoding

transcribe
transcribe copied to clipboard