UnicodeDecodeError when filename includes non ASCII characters
trying to read from a file whose filename is not ascii characters:
magic.from_file("説明.txt")
And this gives me error:
Traceback (most recent call last):
File "G:\BaiduNet\unarchive.py", line 64, in <module>
magic.from_file("説明.txt")
File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 135, in from_file
return m.from_file(filename)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 89, in from_file
return maybe_decode(magic_file(self.cookie, filename))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 214, in maybe_decode
return s.decode('utf-8')
^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 16: invalid continuation byte
If I rename the file to ASCII name, say file.txt, the problem disappears.
Also, if I use .from_buffer(), there's no issue:
magic.from_buffer(open("説明.txt", "rb").read(2048), mime=True)
weird, not sure if this is related to this issue #205
The package is installed with pip install python-magic-bin on WIndows 11, Python3.11
Hi, I have the same problem.
My code is:
magic.from_file(file_path, mime=True)
My error is:
File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 135, in from_file
return m.from_file(filename)
File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 89, in from_file
return maybe_decode(magic_file(self.cookie, filename))
File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 214, in maybe_decode
return s.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 57: invalid continuation byte
I tried to edit "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 214 from return s.decode('utf-8') to return s.decode('utf-8', errors='ignore') or return s.decode('utf-8', errors='replace') but I still encounter the problem.