TransformerTranslation icon indicating copy to clipboard operation
TransformerTranslation copied to clipboard

load_train_val_test_data()出现问题,train数据集Load完,但一到val数据集就有问题

Open lyconghk opened this issue 2 years ago • 1 comments

我是python小白,运行代码说 RuntimeError: Token Eine not found and default index is not set ,但我用print查找过有这个Eine字符, 请问是什么问题?

======================================================================= 49 data_loader.load_train_val_test_data(config.train_corpus_file_paths, 50 config.val_corpus_file_paths, ---> 51 config.test_corpus_file_paths) 52 logging.info("############初始化模型############") 53 translation_model = TranslationModel(src_vocab_size=len(data_loader.de_vocab),

/tmp/ipykernel_27/774184140.py in load_train_val_test_data(self, train_file_paths, val_file_paths, test_file_paths) 42 print('load traning data done') 43 ---> 44 val_data = self.data_process(val_file_paths) 45 print('load val data done') 46

/tmp/ipykernel_27/774184140.py in data_process(self, filepaths) 25 for (raw_de, raw_en) in tqdm(zip(raw_de_iter, raw_en_iter),ncols=80): 26 de_tensor_ = torch.tensor([self.de_vocab[token] for token in ---> 27 self.tokenizer'de'], dtype=torch.long) 28 en_tensor_ = torch.tensor([self.en_vocab[token] for token in 29 self.tokenizer'en'], dtype=torch.long)

/tmp/ipykernel_27/774184140.py in (.0) 24 logging.info(f"### 正在将数据集 {filepaths} 转换成 Token ID ") 25 for (raw_de, raw_en) in tqdm(zip(raw_de_iter, raw_en_iter),ncols=80): ---> 26 de_tensor_ = torch.tensor([self.de_vocab[token] for token in 27 self.tokenizer'de'], dtype=torch.long) 28 en_tensor_ = torch.tensor([self.en_vocab[token] for token in

/opt/conda/lib/python3.7/site-packages/torchtext/vocab/vocab.py in getitem(self, token) 62 The index corresponding to the associated token. 63 """ ---> 64 return self.vocab[token] 65 66 @torch.jit.export

RuntimeError: Token Eine not found and default index is not set

lyconghk avatar Oct 24 '22 09:10 lyconghk

我想是因为创建词表时用的是训练数据,而将验证或测试数据转换为索引时,可能有词表中没有的token,也就是OOV token。所以需要设置一下词表的默认索引,以应对OVV token

zhouhuaian avatar Dec 04 '23 02:12 zhouhuaian