KBQA icon indicating copy to clipboard operation
KBQA copied to clipboard

Indexing procedure

Open Srinivas-R opened this issue 5 years ago • 3 comments

Hi! What is the procedure for step 4 'Index entities and predicates into ES'? util/index.py requires a entity-frequency file, is that something we're supposed to create from dbpedia2016-04en.hdt and then feed into it? Thank you.

Srinivas-R avatar Feb 24 '20 07:02 Srinivas-R

Hi! What is the procedure for step 4 'Index entities and predicates into ES'? util/index.py requires a entity-frequency file, is that something we're supposed to create from dbpedia2016-04en.hdt and then feed into it? Thank you.

Hi! Have you tried to construct entity-frequency file? Does it work?

passenger20 avatar Mar 04 '20 01:03 passenger20

Hi! What is the procedure for step 4 'Index entities and predicates into ES'? util/index.py requires a entity-frequency file, is that something we're supposed to create from dbpedia2016-04en.hdt and then feed into it? Thank you.

Hi! Have you tried to construct entity-frequency file? Does it work?

I created it and the indexing worked, sure. However, we're still unclear what kind of preprocessing has been done on the input dataset to get it into the format consumed by the jupyter notebook. Seems to be a single 'train' flag in the json indicating train or test example, but we faced issues such as question ID not found (KeyError). Have you managed to get it working?

Srinivas-R avatar Mar 04 '20 16:03 Srinivas-R

Hi! What is the procedure for step 4 'Index entities and predicates into ES'? util/index.py requires a entity-frequency file, is that something we're supposed to create from dbpedia2016-04en.hdt and then feed into it? Thank you.

Hi! Have you tried to construct entity-frequency file? Does it work?

I created it and the indexing worked, sure. However, we're still unclear what kind of preprocessing has been done on the input dataset to get it into the format consumed by the jupyter notebook. Seems to be a single 'train' flag in the json indicating train or test example, but we faced issues such as question ID not found (KeyError). Have you managed to get it working?

Thank you for your notification. You could try "/data/lcquad_clean.json" as input data and modify the jupyter notebook to match it. I guess some steps in the preprocessing are missed.

passenger20 avatar Mar 05 '20 07:03 passenger20