evals icon indicating copy to clipboard operation
evals copied to clipboard

oaieval fails with error UnicodeDecodeError: 'charmap' codec can't decode byte

Open rslinford opened this issue 1 year ago • 1 comments

Describe the bug

Running oaieval fails with a UnicodeDecodeError at line 207 of registry.py. Adding the encoding solves the problem. See Code snippets.

To Reproduce

  1. Run the following at the command line: oaieval gpt-3.5-turbo identity

The error only happens attempting a Model Graded eval. Doing a FuzzyMatch works just fine.

Code snippets

The error happens in registry.py on line 207:
    with open(path, "r") as f:

Specifying the encoding solves the problem: 
    with open(path, "r", encoding='utf-8') as f:

OS

Windows

Python version

3.11.0

Library version

0.27.2

rslinford avatar May 17 '23 00:05 rslinford

Can workaround without modifying source, by setting environment variable PYTHONUTF8 = 1

It happens due to Japanese characters in evals\registry\modelgraded\humor.yaml

Some error output to aid searchers:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\dev\PixelPenguinInc\venv\Scripts\oaieval.exe\__main__.py", line 7, in <module>
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\cli\oaieval.py", line 164, in main
    run(args)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\cli\oaieval.py", line 134, in run
    eval = eval_class(
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\elsuite\modelgraded\classify.py", line 76, in __init__
    self.mg: ModelGradedSpec = self.registry.get_modelgraded_spec(
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 154, in get_modelgraded_spec
    assert name in self._modelgraded_specs, (
  File "C:\Program Files\Python310\lib\functools.py", line 981, in __get__
    val = self.func(instance)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 272, in _modelgraded_specs
    return self._load_registry([p / "modelgraded" for p in self._registry_paths])
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 253, in _load_registry
    self._process_directory(registry, path)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 241, in _process_directory
    self._process_file(registry, file)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\evals\registry.py", line 208, in _process_file
    d = yaml.safe_load(f)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\__init__.py", line 79, in load
    loader = Loader(stream)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\loader.py", line 34, in __init__
    Reader.__init__(self, stream)
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\reader.py", line 85, in __init__
    self.determine_encoding()
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
    self.update_raw()
  File "C:\dev\PixelPenguinInc\venv\lib\site-packages\yaml\reader.py", line 178, in update_raw
    data = self.stream.read(size)
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 783: character maps to <undefined>

robatwilliams avatar Jun 16 '23 10:06 robatwilliams