fever-2018-team-athene
fever-2018-team-athene copied to clipboard
Key error during selecting sentence for training set
I've got key error during selecting sentence for training set. (error message below)
[INFO] 2021-10-20 04:32:44,511 - pipeline - Finished selecting sentences for dev set. INFO:pipeline:Finished selecting sentences for dev set. [INFO] 2021-10-20 04:32:44,512 - pipeline - Starting selecting sentences for training set... INFO:pipeline:Starting selecting sentences for training set... 100%|███████████████████████████████████████████████████████████████████████████████████████████| 145449/145449 [03:46<00:00, 642.38it/s] Traceback (most recent call last): File "src/scripts/athene/pipeline.py", line 196, in <module> sentence_retrieval_ensemble(logger, args.mode) File "src/scripts/athene/pipeline.py", line 138, in sentence_retrieval_ensemble sentence_retrieval_ensemble_entrance(_args) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/ensemble.py", line 265, in entrance random_seed=args.random_seed, reserve_embed=args.reserve_embed) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 33, in __init__ self.data_pipeline() File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 69, in data_pipeline self.test_indexes = self.predict_indexes_loader(test_indexes_path, tests) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 439, in predict_indexes_loader predicts_indexes = self.predict_data_indexes(predict_data, self.iword_dict) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 400, in predict_data_indexes sent_index = self.sent_2_index(sent, word_dict, self.s_max_length) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 376, in sent_2_index word_indexes.append(word_dict[word.lower()]) KeyError: 'wedgwood'
I think the problem comes from word dictionary that is generated from train_sample.p. Since train_sample.p is generated from negative sampled training dataset, the vocabulary does not include whole words in training data.
I solved this problem by changing data.py from
words_dict_path = os.path.join(self.embedding_path, "words_dict.p")
if os.path.exists(words_dict_path):
with open(words_dict_path, "rb") as f:
self.word_dict = pickle.load(f)
else:
self.word_dict = self.get_complete_words(words_dict_path, X_train, devs, tests)
to
words_dict_path = os.path.join(self.embedding_path, "words_dict.p")
self.word_dict = self.get_complete_words(words_dict_path, X_train, devs, tests)
to update dictionary every time.
Is my solution looks fine?