evals icon indicating copy to clipboard operation
evals copied to clipboard

Windows path and unicode decoding

Open ulasdilek opened this issue 1 year ago • 2 comments

Hi, I am trying to contribute and get access to GPT-4 by creating my own evals but I thought that I need to be able to run evals before starting. So, I was trying to figure out how to run an eval following one of your examples, "lafand-mt.ipynb", when I found out two problems that resulted in errors for me.

  1. I am using Windows and this is a problem caused by my OS using "" instead of "/" as directory delimiter. I believe there should be OS-dynamic solutions to use them interchangeably. On code block 3, line 13, the code langs = input_path.split('/')[-1] would find the '-' in the path "...\lafand-mt" and thus bring three elements in langs.split('-'). For instance, [ "...\data\lafand", "mt\en", "amh"]. This breaks the following line as the output has three elements and is not in the expected format input_lang, output_lang = langs.split('-'). I was able to bodge it by changing '/' to '\' but this should not be the community-standard solution. Furthermore, I would not want Windows users who do not know about this to get lost while following your example.
  2. When running the 6th code block, I got a UnicodeDecodeError. I do not know if this happens to other users but I suggest that you add to the main branch encoding='utf-8' as another parameter for .open() in line 6 as it seems to get rid of the error. Keep up the good work!

ulasdilek avatar Mar 21 '23 11:03 ulasdilek

This is partly related to.

  • #209

Ein-Tim avatar Mar 21 '23 19:03 Ein-Tim

@ulasdilek

I'm trying to make a PR for this.

The PR will address the separator issue by using os.path.sep instead.

jonathanagustin avatar Mar 21 '23 21:03 jonathanagustin