DHS-LLM-Workshop
DHS-LLM-Workshop copied to clipboard
In the preparetion of train dataset, why "input_ids" is equal with "labels"?
in the personal_copilot/training/train.py class ConstantLengthDataset(IterableDataset) where 249th and 250th line.
yield { "input_ids": torch.LongTensor(example), "labels": torch.LongTensor(example), }
If input is " I LIKE APPLE", why we need to teach the model to repeat "I LIKE APPLE" instead of saying "I LIKE APPLE, BECAUSE IT IS SWEET"?