evals issues

Results 428 evals issues

Sort by recently updated

Evaluation on computer vision benchmarks

Are there plans to evaluate the vision modality of GPT-4? I am interested to know how GPT-4 could perform on classification tasks with 0- and few-shot-learning and how it compares...

finitearth

Idea for Eval

Logical reasoning eval | Accuracy 0%

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

shubasishdas

Q-Learning Eval

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

mmtmn

Added eval to predict next number in the series (Accuracy: 0.97, Samples: 100)

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

hmzakhalid

Pattern recognition for numbers in a sequence with a fixed pattern -- Accuracy: 0.02 with 100 tests

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

Lionary

Identifying publicly available trials from clinicaltrials.gov

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

wraasch

Do math problems related to calculating dates using the Chinese Sexagenary Cycle method. 🧮

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

DunedainStrider

Word Vector Over-reliance Eval(Accuracy 0%)

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

neolizhe

[unit test] Adding unit test for metrics.get_accuracy

Adding a unit test to get the ball rolling, starting with metrics since they are fundamental to evaluating performance. :) It would be great to add some more tests when...

kjbilton

tic-tac-toe eval (100% failure rate on gpt-3.5-turbo)

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

huyouare

evals
evals copied to clipboard

Metadata

Evaluation on computer vision benchmarks

Logical reasoning eval | Accuracy 0%

Q-Learning Eval

Added eval to predict next number in the series (Accuracy: 0.97, Samples: 100)

Pattern recognition for numbers in a sequence with a fixed pattern -- Accuracy: 0.02 with 100 tests

Identifying publicly available trials from clinicaltrials.gov

Do math problems related to calculating dates using the Chinese Sexagenary Cycle method. 🧮

Word Vector Over-reliance Eval(Accuracy 0%)

[unit test] Adding unit test for metrics.get_accuracy

tic-tac-toe eval (100% failure rate on gpt-3.5-turbo)

← Metadata

Owner

Metadata

evals evals copied to clipboard

Metadata

← Metadata

Owner

Metadata

evals
evals copied to clipboard