sft_packing实现的问题
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
看目前sft_packing的实现只是单纯将不同的单轮sft数据拼接到一起,然后分别计算target部分的loss
def preprocess_packed_supervised_dataset(
examples: Dict[str, List[Any]],
tokenizer: "PreTrainedTokenizer",
template: "Template",
data_args: "DataArguments",
) -> Dict[str, List[List[int]]]:
# build inputs with format <bos> X1 Y1 <eos> <bos> X2 Y2 <eos>
# and labels with format <ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos>
model_inputs = {"input_ids": [], "attention_mask": [], "labels": []}
这里是不是应该增加对position_ids的修改呢?从而保证每条单轮sft在计算loss的时候不会受到其他拼接的上文影响
Expected behavior
No response
System Info
No response
Others
No response
请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢?
@hiyouga Has LLama-Factory implemented this 'https://github.com/MeetKai/functionary/tree/main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.
any update on this issue? @hiyouga
llama 3也修改了attention mask,但没提position id,position id真的有必要修改吗?rope本身就是相对编码
请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢?
同样的问题,为什么不考虑处理atten_mask。单纯拼接,后面的数据能看到前面的数据的意义在哪?
@hiyouga Has LLama-Factory implemented this 'MeetKai/functionary@
main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.
The function 'preprocess_packed_supervised_dataset' does not currently implement atten_mask for other instances.
@hiyouga, do you have any plans to add this feature in the future?
@letterk will be fixed after merging #4224