langtest icon indicating copy to clipboard operation
langtest copied to clipboard

Deliver safe & effective language models

Results 78 langtest issues
Sort by recently updated
recently updated
newest added

**Abstract:** Large language models (LLMs) are powerful tools for natural language processing (NLP) tasks. However, their performance often suffers in low-data scenarios due to limited training data. This project investigates...

Newly introduced benchmark dataset GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their...

⭐ Feature

**Summary:** This issue proposes the implementation of a leaderboard to compare the performance of different quantization settings (e.g., GGUF 4 bits, GGUF 6 bits, etc.) within LangTest. This leaderboard would...

As Langtest prioritizes model quality assessment, it is imperative to acknowledge the profound impact of data quality on model performance. Hence, integrating comprehensive data quality testing measures becomes crucial for...

⭐ Feature

https://github.com/BerriAI/litellm

Explore the new tool released by Microsoft for evaluation of LLMs. Brief description: > It consists of a wide range of LLMs and evaluation datasets, covering diverse tasks, evaluation protocols,...

⏭️ Next Release

Reference : https://textgeneration.substack.com/p/cognitive-biases-in-llms-as-evaluators?r=2abzqn&utm_campaign=post&utm_medium=web

⏭️ Next Release