Potential encoding issue from opening a json file in validator.py under spdx_validator
Hi
This issue is not directly related to this repository, but about using the package 'spdx-validator' to validate a JSON file using the schema (schemas/spdx-schema.json) of this repo.
When opening the json schema file with the spdx validator, I found an error from this code (line 41 in validator.py) since it didn't open the file with an encoding option, and a change like below solved the error.
Before:
with open(schema_file, 'r') as f:
After:
with open(schema_file, 'r', encoding='utf-8') as f:
This could help some other people who are having the same error with me. The error message was:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Scripts\spdx-validator.exe\__main__.py", line 7, in <module>
File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\site-packages\spdx_validator\__main__.py", line 135, in main
validator = SPDXValidator(spdx_version = args.spdx_version,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\site-packages\spdx_validator\validator.py", line 42, in __init__
self.schema = json.load(f)
^^^^^^^^^^^^
File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\json\__init__.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^
File "C:\Users\Hyejin.Lee\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10681: character maps to <undefined>
Just to clarify - this is tripping on the encoding for the spdx-schema.json file?
If so, I wonder what character it is tripping on. I would expect only ascii in the schema file even though it should be encoded in UTF-8 (the SPDX standard for character encoding).
Ping @ty1279 - can you clarify per the comment above?
Haven't heard a response since April - closing