yamllint icon indicating copy to clipboard operation
yamllint copied to clipboard

Can not parse utf-8 strings

Open mattn opened this issue 5 years ago • 9 comments

Traceback (most recent call last):
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\msys64\mingw64\bin\yamllint.exe\__main__.py", line 7, in <module>
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\cli.py", line 189, in run
    prob_level = show_problems(problems, 'stdin', args_format=args.format)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\cli.py", line 91, in show_problems
    for problem in problems:
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\linter.py", line 198, in _run
    syntax_error = get_syntax_error(buffer)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\linter.py", line 179, in get_syntax_error
    list(yaml.parse(buffer, Loader=yaml.BaseLoader))
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\__init__.py", line 73, in parse
    loader = Loader(stream)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\loader.py", line 14, in __init__
    Reader.__init__(self, stream)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\reader.py", line 74, in __init__
    self.check_printable(stream)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\reader.py", line 143, in check_printable
    raise ReaderError(self.name, position, ord(character),
yaml.reader.ReaderError: unacceptable character #xdc82: special characters are not allowed
  in "<unicode string>", position 279

I know https://github.com/adrienverge/yamllint/issues/20 and https://github.com/adrienverge/yamllint/issues/2. But it's on non-Windows. On Windows, LANG, LC_CTYPE does not set in generally. I think yamllint should provide way to read utf-8 string even if LANG/LC_CTYPE is not set.

mattn avatar Dec 16 '19 13:12 mattn

Can you provide a way to reproduce your problem, especially an input file that triggers the error + a yamllint version?

adrienverge avatar Dec 17 '19 17:12 adrienverge

test.yaml

---
テスト: 'コード'
C:\temp>yamllint -v
yamllint 1.19.0

C:\>temp>yamllint test.yaml
Traceback (most recent call last):
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\msys64\mingw64\bin\yamllint.exe\__main__.py", line 7, in <module>
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\cli.py", line 175, in run
    problems = linter.run(f, conf, filepath)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\linter.py", line 237, in run
    content = input.read()
UnicodeDecodeError: 'cp932' codec can't decode byte 0x86 in position 6: illegal multibyte sequence

mattn avatar Dec 18 '19 00:12 mattn

PYTHONIOENCODING=UTF-8 can fix this for stdin

C:\temp>yamllint - < test.yaml
Traceback (most recent call last):
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\msys64\mingw64\bin\yamllint.exe\__main__.py", line 7, in <module>
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\cli.py", line 189, in run
    prob_level = show_problems(problems, 'stdin', args_format=args.format)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\cli.py", line 91, in show_problems
    for problem in problems:
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\linter.py", line 198, in _run
    syntax_error = get_syntax_error(buffer)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\linter.py", line 179, in get_syntax_error
    list(yaml.parse(buffer, Loader=yaml.BaseLoader))
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\__init__.py", line 73, in parse
    loader = Loader(stream)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\loader.py", line 14, in __init__
    Reader.__init__(self, stream)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\reader.py", line 74, in __init__
    self.check_printable(stream)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yaml\reader.py", line 143, in check_printable
    raise ReaderError(self.name, position, ord(character),
yaml.reader.ReaderError: unacceptable character #xdc86: special characters are not allowed
  in "<unicode string>", position 5

C:\temp>set PYTHONIOENCODING=UTF-8

C:\temp>yamllint - < test.yaml

But file inupt still wrong.

C:\temp>yamllint test.yaml
Traceback (most recent call last):
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\msys64\mingw64\lib\python3.8\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\msys64\mingw64\bin\yamllint.exe\__main__.py", line 7, in <module>
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\cli.py", line 175, in run
    problems = linter.run(f, conf, filepath)
  File "C:\msys64\mingw64\lib\python3.8\site-packages\yamllint\linter.py", line 237, in run
    content = input.read()
UnicodeDecodeError: 'cp932' codec can't decode byte 0x86 in position 6: illegal multibyte sequence

mattn avatar Dec 18 '19 00:12 mattn

On Linux, your example file works perfectly. It looks like Windows default encoding is not Unicode.

yamllint uses PyYAML to parse YAML, could you try the following command, to see if PyYAML is able to load the file?

python -c 'import yaml; yaml.safe_load(open("test.yaml").read());'

adrienverge avatar Dec 18 '19 08:12 adrienverge

C:\temp>python -c "import yaml; yaml.safe_load(open('test.yaml').read());"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'cp932' codec can't decode byte 0x86 in position 6: illegal multibyte sequence

mattn avatar Dec 18 '19 09:12 mattn

Might be related to https://github.com/yaml/pyyaml/issues/123#issuecomment-395431735. Probably the following would work.

python -c 'import yaml; yaml.safe_load(open("test.yaml", encoding="utf8").read());'

rhysd avatar Dec 19 '19 09:12 rhysd

I confirmed @rhysd 's code work.

mattn avatar Dec 19 '19 11:12 mattn

I'm doing some issue gardening 🌱🌿 🌷 and came upon this issue. Since it's quite old I just wanted to ask if this is still relevant? If it isn't, maybe we can close this issue?

By closing some old issues we reduce the list of open issues to a more manageable set.

sandstrom avatar Jan 11 '21 15:01 sandstrom

I think it's related to https://github.com/adrienverge/yamllint/pull/238, https://github.com/adrienverge/yamllint/pull/239 and https://github.com/adrienverge/yamllint/pull/240, and should not be closed (or closed as duplicate, if confirmed).

adrienverge avatar Jan 11 '21 15:01 adrienverge