Text-Summarizer-Pytorch-Chinese icon indicating copy to clipboard operation
Text-Summarizer-Pytorch-Chinese copied to clipboard

麻烦博主看看报错?

Open PYMAQ opened this issue 4 years ago • 3 comments

(venv) G:\all_summarization\Text-Summarizer-Pytorch-Chinese>python eval.py --task=test --load_model=0155000.tar 2021-01-04 08:14:08,711 - data_util.log - INFO - log启动 data/chunked/test/test_* 2021-01-04 08:14:11,595 - data_util.log - INFO - Id not found in vocab: 10160 Traceback (most recent call last): File "G:\all_summarization\Text-Summarizer-Pytorch-Chinese\data_util\data.py", line 150, in outputids2words w = vocab.id2word(i) # might be [UNK] File "G:\all_summarization\Text-Summarizer-Pytorch-Chinese\data_util\data.py", line 69, in id2word raise ValueError('Id not found in vocab: %d' % word_id) ValueError: Id not found in vocab: 10160

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "eval.py", line 171, in eval_processor.evaluate_batch(True) File "eval.py", line 82, in evaluate_batch batch.art_oovs[i]) File "G:\all_summarization\Text-Summarizer-Pytorch-Chinese\data_util\data.py", line 156, in outputids2words w = article_oovs[article_oov_idx] IndexError: list index out of range

PYMAQ avatar Jan 04 '21 00:01 PYMAQ

请问怎么才可以跑测试集,发现下面的代码给出的不是真的测试集而是验证集,但是一用新的测试集,就会报上面的错误(词b表问题?) 发现make_data_files中的: # valid_chunk, test_chunk = samples[0], samples[1] # shutil.copyfile(os.path.join(chunk_path, "main_valid", valid_chunk), # os.path.join(chunk_path, "valid", "valid_00.bin")) # shutil.copyfile(os.path.join(chunk_path, "main_valid", test_chunk), # os.path.join(chunk_path, "test", "test_00.bin"))

PYMAQ avatar Jan 04 '21 00:01 PYMAQ

也就是说,当训练好模型后,有新的测试数据过来,如果这个测试集中含有”未登录词“(也就是OOV,原先词表没有的词),这时候模型就会报ValueError: Id not found in vocab: 10160的错误? 博主之前有解决过这个问题吗

PYMAQ avatar Jan 04 '21 00:01 PYMAQ

看上去要改一下未登陆词的处理。我之前也没有仔细研究过

LowinLi avatar Jan 11 '21 04:01 LowinLi