DHS-LLM-Workshop icon indicating copy to clipboard operation
DHS-LLM-Workshop copied to clipboard

In the preparetion of train dataset, why "input_ids" is equal with "labels"?

Open JonL01 opened this issue 8 months ago • 0 comments

in the personal_copilot/training/train.py class ConstantLengthDataset(IterableDataset) where 249th and 250th line. yield { "input_ids": torch.LongTensor(example), "labels": torch.LongTensor(example), }

If input is " I LIKE APPLE", why we need to teach the model to repeat "I LIKE APPLE" instead of saying "I LIKE APPLE, BECAUSE IT IS SWEET"?

JonL01 avatar Jun 21 '24 08:06 JonL01