md2cf icon indicating copy to clipboard operation
md2cf copied to clipboard

UnicodeDecodeError: 'gbk' codec can't decode byte

Open vicat47 opened this issue 3 years ago • 3 comments

I have a document encoded by 'utf-8' but the program try to decode by 'gbk'

Traceback (most recent call last):
  File "C:\Users\vicat\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\vicat\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\vicat\AppData\Local\Programs\Python\Python38\Scripts\md2cf.exe\__main__.py", line 7, in <module>
  File "C:\Users\vicat\AppData\Local\Programs\Python\Python38\lib\site-packages\md2cf\__main__.py", line 350, in main
    md2cf.document.get_page_data_from_file_path(file_name)
  File "C:\Users\vicat\AppData\Local\Programs\Python\Python38\lib\site-packages\md2cf\document.py", line 140, in get_page_data_from_file_path
    markdown_lines = file_handle.readlines()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 2: illegal multibyte sequence

what should i do?

vicat47 avatar Jun 29 '22 02:06 vicat47

should i specify file encoding on cmdlet?

vicat47 avatar Jun 29 '22 02:06 vicat47

Sorry this is happening! Is it a utf-8 file with special characters in it?

iamjackg avatar Aug 28 '22 14:08 iamjackg

it's a chinese markdown file writtern by typora, and i solved the problem by specific encoding on document.py line 142 open(file_path, encoding="utf-8"). When reading a text file, if the open() functions do not declare how they are encoded, python3 will pick the default encoding of the computer operating system the code is running on as the encoding method for the open() functions. My operating system language is Chinese,so it try to decode by gbk,but my file is utf-8. Should a command line argument be provided for this case?
If you think the solution is feasible, I will initiate a pull request thanks

vicat47 avatar Aug 28 '22 14:08 vicat47

Hey, I just pushed an alpha version of v2 -- it has some breaking changes in the output format (https://github.com/iamjackg/md2cf/tree/develop#terminal-output-format), but it includes a fix for this: it tries to autodetect the encoding if the default fails. Can you give it a whirl and let me know if it's fine?

It's at https://pypi.org/project/md2cf/2.0.0a0/ and you can install it with

pip install md2cf --pre

iamjackg avatar Feb 02 '23 14:02 iamjackg