Zhantao Yang

Results 3 issues of Zhantao Yang

changed pos_weight as described to fix #8749

The official repo made some changes that are different from the paper. For example, the paper claims that v2-s uses 272 channels in the last stage, but they changed to...

## Command ``` lm_eval --model vllm --model_args pretrained=Qwen/Qwen2.5-72B-Instruct,tensor_parallel_size=8,dtype=bfloat16,gpu_memory_utilization=0.8,data_parallel_size=1 --seed 42 --log_samples --output_path results/Qwen_Qwen2.5-72B-Instruct/leaderboard/ --tasks leaderboard --apply_chat_template --fewshot_as_multiturn --batch_size 1 ``` GPU: 8xA100 80G ## Other Attempts - changing `gpu_memory_utilization` to...