LLaMA-Factory sft_packing实现的问题

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

看目前sft_packing的实现只是单纯将不同的单轮sft数据拼接到一起，然后分别计算target部分的loss

def preprocess_packed_supervised_dataset( examples: Dict[str, List[Any]], tokenizer: "PreTrainedTokenizer", template: "Template", data_args: "DataArguments", ) -> Dict[str, List[List[int]]]: # build inputs with format <bos> X1 Y1 <eos> <bos> X2 Y2 <eos> # and labels with format <ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos> model_inputs = {"input_ids": [], "attention_mask": [], "labels": []}

这里是不是应该增加对position_ids的修改呢？从而保证每条单轮sft在计算loss的时候不会受到其他拼接的上文影响

Expected behavior

No response

System Info

No response

Others

No response

Jan 22 '24 09:01 dyh1996

请问一下，对于packing的方式（尤其是sft的情况下），除了上面提到的pos，是不是应该设置合适的atten mask，来隔离不同的instance呢？

Apr 20 '24 14:04 muzhi1991

@hiyouga Has LLama-Factory implemented this 'https://github.com/MeetKai/functionary/tree/main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

Apr 22 '24 07:04 DinhLuan14

any update on this issue? @hiyouga

Apr 30 '24 09:04 Ricardokevins

llama 3也修改了attention mask，但没提position id，position id真的有必要修改吗？rope本身就是相对编码

May 19 '24 03:05 chiosChen

请问一下，对于packing的方式（尤其是sft的情况下），除了上面提到的pos，是不是应该设置合适的atten mask，来隔离不同的instance呢？

同样的问题，为什么不考虑处理atten_mask。单纯拼接，后面的数据能看到前面的数据的意义在哪？

Jun 13 '24 09:06 lugimzzz

@hiyouga Has LLama-Factory implemented this 'MeetKai/functionary@main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

The function 'preprocess_packed_supervised_dataset' does not currently implement atten_mask for other instances.

@hiyouga, do you have any plans to add this feature in the future?

Jun 15 '24 05:06 letterk

@letterk will be fixed after merging #4224

Jun 15 '24 06:06 hiyouga