maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

Support target masking (aka loss masking or label masking) for SFT datasets

Open jmschndev opened this issue 7 months ago • 0 comments

Right now, data loading and loss computation assume one is only doing LM pretraining, but it'd be useful to support packed SFT style datasets (i.e. datasets with cleanly delineated prompt/completion pairs, perhaps even a system prompt) and their corresponding masking.

I.e., the masks allow the attention module to reference the prompts/prefix, but only completions/targets' gradients are propogated.

jmschndev avatar Jun 28 '24 17:06 jmschndev