langtest
langtest copied to clipboard
Deliver safe & effective language models
**Abstract:** Large language models (LLMs) are powerful tools for natural language processing (NLP) tasks. However, their performance often suffers in low-data scenarios due to limited training data. This project investigates...
Newly introduced benchmark dataset GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their...
**Summary:** This issue proposes the implementation of a leaderboard to compare the performance of different quantization settings (e.g., GGUF 4 bits, GGUF 6 bits, etc.) within LangTest. This leaderboard would...
As Langtest prioritizes model quality assessment, it is imperative to acknowledge the profound impact of data quality on model performance. Hence, integrating comprehensive data quality testing measures becomes crucial for...
https://github.com/BerriAI/litellm
Explore the new tool released by Microsoft for evaluation of LLMs. Brief description: > It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols,...
Reference : https://textgeneration.substack.com/p/cognitive-biases-in-llms-as-evaluators?r=2abzqn&utm_campaign=post&utm_medium=web