trl icon indicating copy to clipboard operation
trl copied to clipboard

DataCollatorForCompletionOnlyLM does not work with FSDP

Open aabhasgupta opened this issue 1 year ago • 0 comments

fsdp_qlora.txt The loss is returned as NaN when using DataCollatorForCompletionOnlyLM with the FSDP pipeline (attached for reference) "could not find instruction key [882] in the following instance: <|start_header_id|>user<|end_header_id|> " Upon checking the collator function in a separate jupyter notebook, I dont see this error. Is this something to do with the distributed learning approach? I followed the FSDP approach as mentioned here: https://www.philschmid.de/fsdp-qlora-llama3 Can anyone suggest what is it that I am missing here?

aabhasgupta avatar Jun 20 '24 17:06 aabhasgupta