FreebaseQA icon indicating copy to clipboard operation
FreebaseQA copied to clipboard

Why no sparql for each question ?

Open WaNePr opened this issue 5 years ago • 23 comments

Thanks for the dataset, but why there is no sparql query for each question ?

WaNePr avatar Jul 23 '19 15:07 WaNePr

Hi,

These results were extracted from a preprocessed version of Freebase, so we did not use SPARQL queries.

Kelvin

kelvin-jiang avatar Jul 24 '19 02:07 kelvin-jiang

If no SPARQL queries is provided, how to use this dataset properly ?

WaNePr avatar Jul 24 '19 02:07 WaNePr

The dataset includes the Freebase MIDs that directly corresponds to entities in Freebase. Also, a subset of Freebase (preprocessed) has been linked in the README.

kelvin-jiang avatar Jul 24 '19 04:07 kelvin-jiang

i see, but to simulate the execution of sparql query, shouldn't we know the query form, e.g. SELECT, ASK, SELECT DISTINCT etc. ?

WaNePr avatar Jul 24 '19 06:07 WaNePr

Unfortunately, I was not the one that worked on preprocessing Freebase, so I can't provide you with some sample SPARQL queries. Sorry about that.

kelvin-jiang avatar Jul 24 '19 06:07 kelvin-jiang

I am wondering how to check the correct answer of question if only the variables and predicates are known but not the format. Do you mind if you cound point out the person who responsible for the pre-processing or refer this question to him ?

Many thanks !

WaNePr avatar Jul 24 '19 06:07 WaNePr

I'm not sure what you mean by format, but the object node in each Freebase triple is always the answer to the question.

kelvin-jiang avatar Jul 24 '19 16:07 kelvin-jiang

How do you evaluate the result ?

WaNePr avatar Jul 27 '19 08:07 WaNePr

Is it possible to provide the pre-processing files and the evaluation scripts so that we know how to evaluate the result

WaNePr avatar Jul 29 '19 03:07 WaNePr

This dataset should be used to evaluate your KBQA results, there is no need to evaluate it. It was previously labelled and assessed by human annotators.

kelvin-jiang avatar Jul 29 '19 03:07 kelvin-jiang

I do know that the dataset is labelled and assessed by human annotators.

My question is , I see that you provided a subset of Freebase (2.2GB zip) for evaluations, but in the subset, only the EntityMid is given but not the corresponding EntityName. Therefore i would like to know where i could find a NameFile for the subset, so that we could do the evaluation if we use this dataset ? p.s. It could be generated in pre-processing, that's why i ask if you could share the pre-processing files.

Many Thanks.

WaNePr avatar Jul 29 '19 04:07 WaNePr

If you want to get from EntityMid to EntityName, use the type.object.name predicates within the Freebase subset. They map a Freebase MID to its name, and you can filter its tag (e.g. @en) to get language-specific names. An example in the subset: m.010gj6wc type.object.name "Prague"@en.

kelvin-jiang avatar Jul 29 '19 04:07 kelvin-jiang

All EntityMid within the subset can be indexed this way ?

WaNePr avatar Jul 29 '19 05:07 WaNePr

Yes, should be, this is all straight from the original Freebase data dumps. Therefore, it may be possible for some unpopular Freebase MIDs to be missing labels.

kelvin-jiang avatar Jul 29 '19 06:07 kelvin-jiang

But the 'unpopular Freebase MIDs without labels are not in the FreebaseQA dataset , am i right ?

WaNePr avatar Jul 29 '19 07:07 WaNePr

Theoretically yes, if they didn't have a label, our algorithm would not have been able to pick them up.

kelvin-jiang avatar Jul 30 '19 03:07 kelvin-jiang

To evaluate the performance, one can either count the final answerMID or count the topicentitymid and the inferential chain as correct, which way you used ?

WaNePr avatar Aug 01 '19 06:08 WaNePr

The TopicEntityMid refers to the Freebase MID for some entity in the question, not the answer. Instead, you probably want to evaluate your model's performance with the final AnswersMid and AnswersName.

kelvin-jiang avatar Aug 02 '19 02:08 kelvin-jiang

I assume that finding the 'answermid' is equivalent to finding 'topicentitymid+inferentialchain', Is it ?

WaNePr avatar Aug 02 '19 09:08 WaNePr

Another question is Mediator Nodes are not some nodes " do not have a name or alias associated with it" as described in your paper, right? It has an actually name in the subset of freebase dataset.

WaNePr avatar Aug 02 '19 10:08 WaNePr

No, mediator nodes should not have names (like m.010gj6wc type.object.name "Prague"@en), even in the Freebase subset.

kelvin-jiang avatar Aug 03 '19 03:08 kelvin-jiang

Is it common in the whole dataset for which we are not able to uniquely determine the answer to a question by querying freebase subset with the corresponding topicentitymid and the inferential chain?

For the example: { "Question-ID": "FreebaseQA-eval-31", "RawQuestion": "Valencia was the venue for the 2007 and 2010 America's Cup, as the defending yacht was from which landlocked country?", "ProcessedQuestion": "valencia was the venue for the 2007 and 2010 america's cup, as the defending yacht was from which landlocked country", "Parses": [ { "Parse-Id": "FreebaseQA-eval-31.P0", "PotentialTopicEntityMention": "2010 america's cup", "TopicEntityName": "2010 america's cup", "TopicEntityMid": "m.03hh8pp", "InferentialChain": "user.jamie.default_domain.yacht_racing.competition.competitor..user.jamie.default_domain.yacht_racing.competitor.country", "Answers": [ { "AnswersMid": "m.06mzp", "AnswersName": [ "switzerland" ] } ] } ] } We can get two answers, m.06mzp (Switzerland), m.09c7w0 (United States) given the annotated topic entity m.03hh8pp and inferential chain user.jamie.default_domain.yacht_racing.competition.competitor..user.jamie.default_domain.yacht_racing.competitor.country. Without looking up the entity descriptions, one cannot narrow down the answer set, in this case, with the clue "landlocked country" in the question.

Another example: { "Question-ID": "FreebaseQA-eval-35", "RawQuestion": "On the 2014 Winter Olympic Games who did the British men's curling team play in the final?", "ProcessedQuestion": "on the 2014 winter olympic games who did the british men's curling team play in the final", "Parses": [ { "Parse-Id": "FreebaseQA-eval-35.P0", "PotentialTopicEntityMention": "2014 winter olympic games", "TopicEntityName": "2014 winter olympics", "TopicEntityMid": "m.03mfdg", "InferentialChain": "olympics.olympic_games.participating_countries", "Answers": [ { "AnswersMid": "m.0d060g", "AnswersName": [ "canada" ] } ] } ] } How can we know the answer should be Canada given only the inferential chain olympics.olympic_games.participating_countries since many countries participated the olympic game in that year.

WaNePr avatar Aug 19 '19 02:08 WaNePr

These examples probably shouldn't have been included in the FreebaseQA data set since the inferential chains don't completely reflect the meaning behind the questions. The data set isn't perfect since labelling was done by human annotators as I've mentioned earlier, so sometimes some bad examples pop up.

kelvin-jiang avatar Aug 19 '19 04:08 kelvin-jiang