spdx-spec icon indicating copy to clipboard operation
spdx-spec copied to clipboard

Potential encoding issue from opening a json file in validator.py under spdx_validator

Open ty1279 opened this issue 2 years ago • 2 comments

Hi

This issue is not directly related to this repository, but about using the package 'spdx-validator' to validate a JSON file using the schema (schemas/spdx-schema.json) of this repo.

When opening the json schema file with the spdx validator, I found an error from this code (line 41 in validator.py) since it didn't open the file with an encoding option, and a change like below solved the error.

Before: with open(schema_file, 'r') as f: After: with open(schema_file, 'r', encoding='utf-8') as f:

This could help some other people who are having the same error with me. The error message was:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Scripts\spdx-validator.exe\__main__.py", line 7, in <module>
  File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\site-packages\spdx_validator\__main__.py", line 135, in main
    validator = SPDXValidator(spdx_version = args.spdx_version,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\site-packages\spdx_validator\validator.py", line 42, in __init__
    self.schema = json.load(f)
                  ^^^^^^^^^^^^
  File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10681: character maps to <undefined>

ty1279 avatar Sep 13 '23 19:09 ty1279

Just to clarify - this is tripping on the encoding for the spdx-schema.json file?

If so, I wonder what character it is tripping on. I would expect only ascii in the schema file even though it should be encoded in UTF-8 (the SPDX standard for character encoding).

goneall avatar Sep 16 '23 18:09 goneall

Ping @ty1279 - can you clarify per the comment above?

goneall avatar Apr 04 '24 22:04 goneall

Haven't heard a response since April - closing

goneall avatar Aug 11 '24 22:08 goneall