learning_to_retrieve_reasoning_paths Why are some document titles missing?

Why are some document titles missing?

Open mukhal opened this issue 2 years ago • 2 comments

Thank you for the amazing repo.

I am curious why are some titles missing from the tfidf index. It seems that during evaluation we get multiple such warnings:

Oranjegekte_0 is missing
James Gunn_0 is missing
..

I assume this means that some document titles are not found in the database. Is that normal? could you explain?

Thanks!

Oct 28 '21 22:10 mukhal

Hi, sorry for my late response! Could you share the command you are running and in which dataset you have that issue? I think I have seen the same issue when the Wikipedia title (id) cannot be matched with any of the ids in the database. In particular,

the code cannot handle well some Unicode characters
the Wikipedia entity titles have been changed or directed to the new one

Mar 26 '22 21:03 AkariAsai

Thanks for the response. This happens with HotpotQA when I run the following command or similar commands.

python run_graph_retriever.py \
        --task hotpot_open \
        --bert_model bert-base-uncased --do_lower_case \
        --dev_file_path path/to/hotpotqa/dev \
        --output_dir path/to/output \
        --model_suffix 3\
        --max_para_num 10 \
        --tfidf_limit 50 \
        --beam 4\
        --eval_chunk 200 \
        --eval_batch_size 64 \
        --split_chunk 1000\
        --pruning_by_links \
        --example_limit 128

I think the main issue is that some titles are retrieved by the tfidf retriever, but when trying to retrieve their content using tfidf_retriever.load_abstract_para_text(), it outputs this warning for some documents. Not sure if I should worry about it, though since I was able to reproduce your results with the warning happening many times.

Mar 27 '22 01:03 mukhal

learning_to_retrieve_reasoning_paths learning_to_retrieve_reasoning_paths copied to clipboard

Why are some document titles missing?

learning_to_retrieve_reasoning_paths
learning_to_retrieve_reasoning_paths copied to clipboard