FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

removed a duplicate line

Open gpgg opened this issue 1 year ago • 0 comments
trafficstars

Why are these changes needed?

In the fastchat/train/train.py file, I found a repeated assignment of the tokenizer in the LazySupervisedDataset class.

class LazySupervisedDataset(Dataset):
    """Dataset for supervised fine-tuning."""

    def __init__(self, raw_data, tokenizer: transformers.PreTrainedTokenizer):
        super(LazySupervisedDataset, self).__init__()
        self.tokenizer = tokenizer # <<------- this is a duplicate line.

        rank0_print("Formatting inputs...Skip in lazy mode")
        self.tokenizer = tokenizer 
        self.raw_data = raw_data
        self.cached_data_dict = {}

Therefore, I removed the duplicate line.

Related issue number (if applicable)

Checks

  • [x] I've run format.sh to lint the changes in this PR.
  • [x] I've included any doc changes needed.
  • [x] I've made sure the relevant tests are passing (if applicable).

gpgg avatar Aug 14 '24 17:08 gpgg