starcoder Loss computation in finetune

Loss computation in finetune

Open edwardelric1202 opened this issue 2 years ago • 2 comments

trafficstars

Hi, I wonder how does the training loss computed when we finetune starcoder by lora? Is the loss computation includes prompt tokens and generated tokens, or just the generated tokens? Since training code in this repo uses huggingface trainer, I think it's hard to find the answer. Hope for your reply.

Jun 13 '23 08:06 edwardelric1202

You might find an answer by checking the output of a dataset, i.e. to look at a concrete sample and its input_ids and labels, which is exactly what huggingface.trainer class use for loss calculation.

Jun 27 '23 02:06 WrViajreo

Hi. In finetune.py, you fine-tune starcoder of a dataset containing a set of sentences framed as Question:<question>\n\nAnswer:<answer>. The loss is causal and in this case it considers all the tokens including Question and Answer. However, in chat/train.py, the computation of the loss does not consider the part corresponding to the question (see this and this).

Jun 28 '23 13:06 ArmelRandy

starcoder starcoder copied to clipboard

Loss computation in finetune

starcoder
starcoder copied to clipboard