ByteMLPerf 【Issue Help】 chatglm2-6b has some cases dismatch with golden

【Issue Help】 chatglm2-6b has some cases dismatch with golden

Open DeepTecher opened this issue 8 months ago • 2 comments

https://github.com/bytedance/ByteMLPerf/blob/main/byte_infer_perf/llm_perf/workloads/chatglm2-torch-fp16-6b.json

We run on A100-40G to get output logits with the below configuration：

{
    "model": "chatglm2-torch-fp16-6b",
    "test_accuracy": true,
    "test_perf": true,
    "min_new_tokens": 128,
    "max_new_tokens": 256,
    "tp_sizes": [1, 2],
    "batch_sizes":[1, 2, 4, 8],
    "input_tokens": [1024, 2048],
    "dataset": "llm_perf/datasets/merged_52_test.csv",
    "perf_time": 180
}

It seems that some dimensions do not match the golden values. one case of 52 cases:

id,question,A,B,C,D
0,"对于以下结构定义，++p->str中的++加在____
struct{
int len;
char*str;
}*P;",指针 p 上,指针 str 上,str 指的内容上,语法错误

Jun 03 '24 09:06 DeepTecher

ByteMLPerf ByteMLPerf copied to clipboard

【Issue Help】 chatglm2-6b has some cases dismatch with golden

ByteMLPerf
ByteMLPerf copied to clipboard