litgpt Evaluation with OpenCompass

Hi, thanks for the great works.

We are opencompass team(https://github.com/internLM/OpenCompass/), and focus on LLM evalaution.

OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:

Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.
Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue type prompt templates, to easily stimulate the maximum performance of various models.
Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded!
Experiment management and reporting mechanism: Use config files to fully record each experiment, support real-time reporting of results.

We would like to support the evaluation of lit-gpt with opencompass. If you have any ideas or suggestions, feel free to raise an issue or contact us with [email protected]

Aug 16 '23 05:08 tonysy

I vote it.

Any progress on it?

Sep 14 '23 07:09 taomanwai

Any progress on it?

Oct 19 '23 15:10 SinclairCoder

Any progress on it?

I'm looking forward to using NeedleBench to evaluate long-context capabilities better.

Aug 08 '25 03:08 xiazhuo