BuboQA icon indicating copy to clipboard operation
BuboQA copied to clipboard

Unfair to skip questions in test dataset

Open xhuang31 opened this issue 6 years ago • 5 comments

In augment_process_dataset.py, some questions in the test dataset are skipped. It is unfair to use such results to compare to the state-of-the-art. Nothing should be changed in the test data.

xhuang31 avatar Jul 11 '18 16:07 xhuang31

@xhuang31 Thank you so much for your comments! I rechecked the code and found that we indeed skipped some questions for whole dataset, not considering their sources. I recalculated and found that we ignore 54(0.2%) questions in test data. We will change our denominator when calculating the final results. Thank you again for spotting this !

Impavidity avatar Jul 12 '18 01:07 Impavidity

Do you know any methods that could prevent this problem (reserve these dropped questions)? Does it requires a larger KB? I am a beginner of this area and have few related knowledge. Looking forward to your reply, thanks!

fwzlaughing avatar Jul 13 '18 14:07 fwzlaughing

@fwzlaughing Hi just have a quick fix here 73b5a42.

Impavidity avatar Jul 13 '18 14:07 Impavidity

Why some entities' names can't be found in FB5M.name.txt? Is there a complete dataset that contains all the entities' names of FB2M?

fwzlaughing avatar Jul 13 '18 15:07 fwzlaughing

@fwzlaughing FB5M is a subset of freebase. If you are interested in the full freebase, you could download the dump from https://developers.google.com/freebase/

Impavidity avatar Jul 13 '18 16:07 Impavidity