evals
evals copied to clipboard
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Are there plans to evaluate the vision modality of GPT-4? I am interested to know how GPT-4 could perform on classification tasks with 0- and few-shot-learning and how it compares...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
Pattern recognition for numbers in a sequence with a fixed pattern -- Accuracy: 0.02 with 100 tests
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...
Adding a unit test to get the ball rolling, starting with metrics since they are fundamental to evaluating performance. :) It would be great to add some more tests when...
# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...