Baichuan2 baichuan2 mmlu结果复现的问题

baichuan2 mmlu结果复现的问题

Open zhanghan1992 opened this issue 1 year ago • 1 comments

评估使用的代码：https://github.com/baichuan-inc/Baichuan-7B/blob/main/evaluation/evaluate_mmlu.py

用bf16精度测试 llama2-13-hf 和 baichuan2-13b-base llama2-13-hf: 0.550 baichuan2-13b-base: 0.564

改了一行代码，用fp32测试： #model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype=torch.bfloat16, device_map="auto",trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(args.model, device_map="auto",trust_remote_code=True) llama2-13-hf: 0.554 baichuan2-13b-base: 0.590

请教下，为啥baichuan2在bf16和fp32精度下结果差这么多？

Nov 06 '23 06:11 zhanghan1992

我也遇到过一个bf16的问题，不知道跟你的问题是否相关

我使用的是baichuan2-13b-chat，用的时候在前面加了很长的prompt（几千个token），使用bf16加载模型时对话效果很差，而使用fp32时就没有问题。

后来发现原因是

bf16本身精度比较低，能表示的有效数字少，比如像2502从fp32变成bf16时就成了2496
alibi mask在实现中做了offset，我不清楚是什么目的，原始的[-(n-1),...,0]就变成了[0, 1, 2, ..., (n-1)]，n比较大时一般有效数字也多，bf16就表示不了了，出现误差。因为做了上面那个offset，造成越接近当前token的上文token误差越大（这是反直觉的，应该是越相近越有用）。由于这个mask每次attention都会用，前向计算时还有可能形成误差积累，到最后一层时表现出来的就是效果差了。

Feb 07 '24 09:02 yantingxu

Baichuan2 Baichuan2 copied to clipboard

baichuan2 mmlu结果复现的问题

Baichuan2
Baichuan2 copied to clipboard