xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

数据在入过程中样本量减少

Open Jason8Kang opened this issue 9 months ago • 1 comments

image

如图,原本有3w多样本, 最后就只有4k多,该如何定位该问题? 脚本如下: rm -rf llama3_finetune_pth/* output_dir=llama3_finetune_pth config_py=xtuner/configs/llama/llama3_8b_instruct/llama3_8b_instruct_qlora_alpaca_e3.py CUDA_VISIBLE_DEVICES=0,1 NPROC_PER_NODE=2 xtuner train ${config_py} --work-dir ${output_dir} --deepspeed deepspeed_zero2 --seed 1024

Jason8Kang avatar May 13 '24 14:05 Jason8Kang

XTuner 默认会拼接数据至 max length 以提升 gpu 利用率,导致实际 iter 和数据条数不符

https://xtuner.readthedocs.io/zh-cn/docs/acceleration/pack_to_max_length.html

pppppM avatar May 14 '24 03:05 pppppM

懂了,多谢解答

Jason8Kang avatar May 17 '24 13:05 Jason8Kang