LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

SFT data packing的实现,会基于cutoff_len截断,这样单个样本被切分,是不是有问题?

Open bityigoss opened this issue 1 year ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

Code

实际跑下来,发现打印的样本确实是被截断的,不是按照预期的能够保证一条样本能在一个model_inputs中 同时代码里面没有看到针对multipacking,attention mask相关的对应改动

Expected behavior

build inputs with format <bos> X1 Y1 <eos> <bos> X2 Y2 <eos> and labels with format <ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos>

System Info

No response

Others

No response

bityigoss avatar May 29 '24 07:05 bityigoss