simple-evals icon indicating copy to clipboard operation
simple-evals copied to clipboard

Results 14 simple-evals issues
Sort by recently updated
recently updated
newest added

Fix for #2 . Simply renames types.py to eval_types.py

Following up on #1 - loading directly through urllib instead of Blobfile. Azure still tries to invoke credentials, even for a public file, and doesn't seem to work (for me...

The types.py file is overriding the stdlib module "types".

Please add them as well. ```[tasklist] ### Tasks - [ ] Run benchmarks also for GPT-3.5 versions and Claude Sonnet and Haiku #7 ```

There is a small typo in `humaneval_eval.py` where a non-existent method named `_pack_mesage` is called. This PR uses the correct function name.

I'm a student working on a final project and wanted to use the granular data here (e.g. not “GPT-4o hits 88.7% on MMLU" but rather “what did it answer for...

This PR fix typo in humaneval_eval.py Before fix: ``` sampler._pack_mesage(role="user", content=instruction + sample["prompt"]) ``` after fix: ``` sampler._pack_message(role="user", content=instruction + sample["prompt"]) ```

Zero-shot scores for those models are not easily googleable — so this would be very useful for looking at the improvement trend over time!

It would be useful to have access to tables with scores for individual evaluation items, as argued here: https://www.science.org/doi/pdf/10.1126/science.adf6369