opencompass issues

[Feature]在model文件中没有找到关于Qwen1.5-32b-chat的配置文件

2

### Describe the feature 只有7b，14b，72b。这个怎么办呢？ ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

Egber1t

[Feature] 兼容torch_npu

1

### Describe the feature 目前看到代码中强依赖于torch.cuda，希望可以更改接口并兼容npu卡，即兼容torch_npu。 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

li126com

[Bug] Long text evaluation parameters are not clear

3

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...

bullw

[Feature] 自定义数据集命令如何修改评估指标？

### Describe the feature python run.py \ --models hf_llama2_7b \ --custom-dataset-path xxx/test_qa.jsonl \ --custom-dataset-data-type qa \ --custom-dataset-infer-method gen 使用这个命令得到的结果得分默认是accuracy。这意味着要完全相同才能算对么？如何替换成别的评估指标呢？通过新增配置文件，学习成本有点高。。。 ### Will you implement it? - [ ] I would...

liushiton

[Feature] Support PromptCBLUE

1

### Describe the feature Is there any plan to support PromptCBLUE, a Chinese medical LLM evaluation benchmark? https://github.com/michael-wzhu/PromptCBLUE ### Will you implement it? - [ ] I would like to...

TaoSunVoyage

API-DEMO

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

TwT-JD

[Feature] Add documentation and example for NumWorkersPartitioner

### Describe the feature Add documentation and example for NumWorkersPartitioner ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

tonysy

[Feature] Improve evaluation scripts for mbpp datasets

4

### Describe the feature When I evaluated the vicuna-7b-v1.5 model using the mbpp_gen script, the score was 0 and most answers showed failed. Perhaps the evaluate script did not properly...

yuhui1038

Update and rename subjective.py to alignmentbench.py

SubjectiveSummarizer not define, change to AlignmentBenchSummarizer.

Ox0400

[Feature] Support LiveCodeBench

2

### LiveCodeBench [Github](https://github.com/LiveCodeBench/LiveCodeBench) [HomePage](https://livecodebench.github.io/) 数据集优点： 1. humaneval 与 mbpp 题目过于基础，该数据集更难 2. 来源于近期的code比赛，数据污染问题上还好很多 3. 除了**写代码**任务，还有 **结果预测**， **代码修复**， **代码执行**。更加全面的衡量一个模型的代码能力 ### 是否希望自己实现该功能？ - [ ] 我希望自己来实现这一功能，并向 OpenCompass 贡献代码！

Ezra-Yu

opencompass
opencompass copied to clipboard

Metadata

[Feature]在model文件中没有找到关于Qwen1.5-32b-chat的配置文件

[Feature] 兼容torch_npu

[Bug] Long text evaluation parameters are not clear

[Feature] 自定义数据集命令如何修改评估指标？

[Feature] Support PromptCBLUE

API-DEMO

[Feature] Add documentation and example for NumWorkersPartitioner

[Feature] Improve evaluation scripts for mbpp datasets

Update and rename subjective.py to alignmentbench.py

[Feature] Support LiveCodeBench

← Metadata

Owner

Metadata

opencompass opencompass copied to clipboard

Metadata

← Metadata

Owner

Metadata

opencompass
opencompass copied to clipboard