TransformerTranslation
TransformerTranslation copied to clipboard
load_train_val_test_data()出现问题,train数据集Load完,但一到val数据集就有问题
我是python小白,运行代码说 RuntimeError: Token Eine not found and default index is not set ,但我用print查找过有这个Eine字符, 请问是什么问题?
======================================================================= 49 data_loader.load_train_val_test_data(config.train_corpus_file_paths, 50 config.val_corpus_file_paths, ---> 51 config.test_corpus_file_paths) 52 logging.info("############初始化模型############") 53 translation_model = TranslationModel(src_vocab_size=len(data_loader.de_vocab),
/tmp/ipykernel_27/774184140.py in load_train_val_test_data(self, train_file_paths, val_file_paths, test_file_paths) 42 print('load traning data done') 43 ---> 44 val_data = self.data_process(val_file_paths) 45 print('load val data done') 46
/tmp/ipykernel_27/774184140.py in data_process(self, filepaths) 25 for (raw_de, raw_en) in tqdm(zip(raw_de_iter, raw_en_iter),ncols=80): 26 de_tensor_ = torch.tensor([self.de_vocab[token] for token in ---> 27 self.tokenizer'de'], dtype=torch.long) 28 en_tensor_ = torch.tensor([self.en_vocab[token] for token in 29 self.tokenizer'en'], dtype=torch.long)
/tmp/ipykernel_27/774184140.py in
/opt/conda/lib/python3.7/site-packages/torchtext/vocab/vocab.py in getitem(self, token) 62 The index corresponding to the associated token. 63 """ ---> 64 return self.vocab[token] 65 66 @torch.jit.export
RuntimeError: Token Eine not found and default index is not set
我想是因为创建词表时用的是训练数据,而将验证或测试数据转换为索引时,可能有词表中没有的token,也就是OOV token。所以需要设置一下词表的默认索引,以应对OVV token