evals icon indicating copy to clipboard operation
evals copied to clipboard

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Results 428 evals issues
Sort by recently updated
recently updated
newest added

### Describe the feature or improvement you're requesting This is a feature to detect errors in grammar and paragraphs to report to users ### Additional context import enchant import nltk...

I am not proficiently in coding or using Github, but I would love to help making evals, sadly there is no coherent guide of how to. Can someone please make...

`docs/custom-eval.md` discusses how to make an eval with custom code, however it is not mentioned anywhere in `docs/custom-eval.md` or `docs/build-eval.md` that these evals are not being accepted currently. It took...

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

I know folks are working on this, any consensus around which are the ones to look at?

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

Homophones are two or more words having the same pronunciation but different meanings, for example, 'rose' (flower) and 'rose' (rise) in English. Currently, I'm learning Mandarin using ChatGPT, and I...

… Diseases # Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR...

Understanding emotions and sentiments is an essential aspect of human communication. The system's ability to recognize these emotions enables more appropriate, context-aware, and empathetic responses. Emotions and sentiments can be...

Idea for Eval