zhanghan

Results 3 issues of zhanghan

是字粒度吗?还是词粒度?如果是按词粒度评估,分词工具用的是什么?

评估使用的代码:https://github.com/baichuan-inc/Baichuan-7B/blob/main/evaluation/evaluate_mmlu.py 用bf16精度测试 llama2-13-hf 和 baichuan2-13b-base llama2-13-hf: 0.550 baichuan2-13b-base: 0.564 改了一行代码,用fp32测试: `#model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype=torch.bfloat16, device_map="auto",trust_remote_code=True) ` `model = AutoModelForCausalLM.from_pretrained(args.model, device_map="auto",trust_remote_code=True)` llama2-13-hf: 0.554 baichuan2-13b-base: 0.590 请教下,为啥baichuan2在bf16和fp32精度下结果差这么多?

评估使用的代码:https://github.com/baichuan-inc/Baichuan-7B/blob/main/evaluation/evaluate_mmlu.py 用bf16精度测试 llama2-13-hf 和 baichuan2-13b-base llama2-13-hf: 0.550 baichuan2-13b-base: 0.564 改了一行代码,用fp32测试: `#model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype=torch.bfloat16, device_map="auto",trust_remote_code=True) ` `model = AutoModelForCausalLM.from_pretrained(args.model, device_map="auto",trust_remote_code=True)` llama2-13-hf: 0.554 baichuan2-13b-base: 0.590 请教下,为啥baichuan2在bf16和fp32精度下结果差这么多?