FastChat
FastChat copied to clipboard
removed a duplicate line
trafficstars
Why are these changes needed?
In the fastchat/train/train.py file, I found a repeated assignment of the tokenizer in the LazySupervisedDataset class.
class LazySupervisedDataset(Dataset):
"""Dataset for supervised fine-tuning."""
def __init__(self, raw_data, tokenizer: transformers.PreTrainedTokenizer):
super(LazySupervisedDataset, self).__init__()
self.tokenizer = tokenizer # <<------- this is a duplicate line.
rank0_print("Formatting inputs...Skip in lazy mode")
self.tokenizer = tokenizer
self.raw_data = raw_data
self.cached_data_dict = {}
Therefore, I removed the duplicate line.
Related issue number (if applicable)
Checks
- [x] I've run
format.shto lint the changes in this PR. - [x] I've included any doc changes needed.
- [x] I've made sure the relevant tests are passing (if applicable).