opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Results 261 opencompass issues
Sort by recently updated
recently updated
newest added

### Describe the feature - Batch Size - Inference time - Evaluation time ### Will you implement it? - [ ] I would like to implement this feature and create...

### 描述该功能 Embedding模型在知识的召回起到至关重要的作用,针对Embedding的专业评测非常有价值。 ### 是否希望自己实现该功能? - [ ] 我希望自己来实现这一功能,并向 OpenCompass 贡献代码!

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

### Describe the feature Hi guys, The TensorRT-LLM has been released last week. It was maintained by NVIDIA with high inference performance. Link: https://github.com/NVIDIA/TensorRT-LLM Will implement it by API calling...

planned feature

### Describe the feature How to use compass to evaluate the local alpaca model on MMLU and other datasets ### Will you implement it? - [ ] I would like...

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

Thank you very much for your contributions to the community. The open-compass/opencompass project **is truly outstanding**, and I envision engaging in further research based on opencompass foundation. In this Pull...

### Describe the feature something like ```python from transformers.generation import GenerationConfig self.model.generation_config = GenerationConfig.from_pretrained(path, trust_remote_code=True) self.model.generation_config.do_sample = False ``` ### Will you implement it? - [X] I would like to...

### Describe the feature lm-evaluation-harness supports ```acc_norm``` evaluation, which is used in [huggingface leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ``` ARC: 25-shot, arc-challenge (acc_norm) HellaSwag: 10-shot, hellaswag (acc_norm) ``` ```acc_norm``` is calculated by the result...