evals
evals copied to clipboard
oaieval fails with error UnicodeDecodeError: 'charmap' codec can't decode byte
Describe the bug
Running oaieval fails with a UnicodeDecodeError at line 207 of registry.py. Adding the encoding solves the problem. See Code snippets.
To Reproduce
- Run the following at the command line: oaieval gpt-3.5-turbo identity
The error only happens attempting a Model Graded eval. Doing a FuzzyMatch works just fine.
Code snippets
The error happens in registry.py on line 207:
with open(path, "r") as f:
Specifying the encoding solves the problem:
with open(path, "r", encoding='utf-8') as f:
OS
Windows
Python version
3.11.0
Library version
0.27.2
Can workaround without modifying source, by setting environment variable PYTHONUTF8 = 1
It happens due to Japanese characters in evals\registry\modelgraded\humor.yaml
Some error output to aid searchers:
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\dev\PixelPenguinInc\venv\Scripts\oaieval.exe\__main__.py", line 7, in <module>
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\cli\oaieval.py", line 164, in main
run(args)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\cli\oaieval.py", line 134, in run
eval = eval_class(
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\elsuite\modelgraded\classify.py", line 76, in __init__
self.mg: ModelGradedSpec = self.registry.get_modelgraded_spec(
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 154, in get_modelgraded_spec
assert name in self._modelgraded_specs, (
File "C:\Program Files\Python310\lib\functools.py", line 981, in __get__
val = self.func(instance)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 272, in _modelgraded_specs
return self._load_registry([p / "modelgraded" for p in self._registry_paths])
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 253, in _load_registry
self._process_directory(registry, path)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 241, in _process_directory
self._process_file(registry, file)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 208, in _process_file
d = yaml.safe_load(f)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\__init__.py", line 79, in load
loader = Loader(stream)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\loader.py", line 34, in __init__
Reader.__init__(self, stream)
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\reader.py", line 85, in __init__
self.determine_encoding()
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
self.update_raw()
File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\reader.py", line 178, in update_raw
data = self.stream.read(size)
File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 783: character maps to <undefined>