starcoder
starcoder copied to clipboard
Loss computation in finetune
Hi, I wonder how does the training loss computed when we finetune starcoder by lora? Is the loss computation includes prompt tokens and generated tokens, or just the generated tokens? Since training code in this repo uses huggingface trainer, I think it's hard to find the answer. Hope for your reply.
You might find an answer by checking the output of a dataset, i.e. to look at a concrete sample and its input_ids and labels, which is exactly what huggingface.trainer class use for loss calculation.
Hi. In finetune.py, you fine-tune starcoder of a dataset containing a set of sentences framed as Question:<question>\n\nAnswer:<answer>. The loss is causal and in this case it considers all the tokens including Question and Answer. However, in chat/train.py, the computation of the loss does not consider the part corresponding to the question (see this and this).