FreebaseQA
FreebaseQA copied to clipboard
Why no sparql for each question ?
Thanks for the dataset, but why there is no sparql query for each question ?
Hi,
These results were extracted from a preprocessed version of Freebase, so we did not use SPARQL queries.
Kelvin
If no SPARQL queries is provided, how to use this dataset properly ?
The dataset includes the Freebase MIDs that directly corresponds to entities in Freebase. Also, a subset of Freebase (preprocessed) has been linked in the README.
i see, but to simulate the execution of sparql query, shouldn't we know the query form, e.g. SELECT, ASK, SELECT DISTINCT etc. ?
Unfortunately, I was not the one that worked on preprocessing Freebase, so I can't provide you with some sample SPARQL queries. Sorry about that.
I am wondering how to check the correct answer of question if only the variables and predicates are known but not the format. Do you mind if you cound point out the person who responsible for the pre-processing or refer this question to him ?
Many thanks !
I'm not sure what you mean by format, but the object node in each Freebase triple is always the answer to the question.
How do you evaluate the result ?
Is it possible to provide the pre-processing files and the evaluation scripts so that we know how to evaluate the result
This dataset should be used to evaluate your KBQA results, there is no need to evaluate it. It was previously labelled and assessed by human annotators.
I do know that the dataset is labelled and assessed by human annotators.
My question is , I see that you provided a subset of Freebase (2.2GB zip) for evaluations, but in the subset, only the EntityMid is given but not the corresponding EntityName. Therefore i would like to know where i could find a NameFile for the subset, so that we could do the evaluation if we use this dataset ? p.s. It could be generated in pre-processing, that's why i ask if you could share the pre-processing files.
Many Thanks.
If you want to get from EntityMid to EntityName, use the type.object.name
predicates within the Freebase subset. They map a Freebase MID to its name, and you can filter its tag (e.g. @en
) to get language-specific names. An example in the subset: m.010gj6wc type.object.name "Prague"@en
.
All EntityMid within the subset can be indexed this way ?
Yes, should be, this is all straight from the original Freebase data dumps. Therefore, it may be possible for some unpopular Freebase MIDs to be missing labels.
But the 'unpopular Freebase MIDs without labels are not in the FreebaseQA dataset , am i right ?
Theoretically yes, if they didn't have a label, our algorithm would not have been able to pick them up.
To evaluate the performance, one can either count the final answerMID or count the topicentitymid and the inferential chain as correct, which way you used ?
The TopicEntityMid
refers to the Freebase MID for some entity in the question, not the answer. Instead, you probably want to evaluate your model's performance with the final AnswersMid
and AnswersName
.
I assume that finding the 'answermid' is equivalent to finding 'topicentitymid+inferentialchain', Is it ?
Another question is Mediator Nodes are not some nodes " do not have a name or alias associated with it" as described in your paper, right? It has an actually name in the subset of freebase dataset.
No, mediator nodes should not have names (like m.010gj6wc type.object.name "Prague"@en
), even in the Freebase subset.
Is it common in the whole dataset for which we are not able to uniquely determine the answer to a question by querying freebase subset with the corresponding topicentitymid and the inferential chain?
For the example: { "Question-ID": "FreebaseQA-eval-31", "RawQuestion": "Valencia was the venue for the 2007 and 2010 America's Cup, as the defending yacht was from which landlocked country?", "ProcessedQuestion": "valencia was the venue for the 2007 and 2010 america's cup, as the defending yacht was from which landlocked country", "Parses": [ { "Parse-Id": "FreebaseQA-eval-31.P0", "PotentialTopicEntityMention": "2010 america's cup", "TopicEntityName": "2010 america's cup", "TopicEntityMid": "m.03hh8pp", "InferentialChain": "user.jamie.default_domain.yacht_racing.competition.competitor..user.jamie.default_domain.yacht_racing.competitor.country", "Answers": [ { "AnswersMid": "m.06mzp", "AnswersName": [ "switzerland" ] } ] } ] } We can get two answers, m.06mzp (Switzerland), m.09c7w0 (United States) given the annotated topic entity m.03hh8pp and inferential chain user.jamie.default_domain.yacht_racing.competition.competitor..user.jamie.default_domain.yacht_racing.competitor.country. Without looking up the entity descriptions, one cannot narrow down the answer set, in this case, with the clue "landlocked country" in the question.
Another example: { "Question-ID": "FreebaseQA-eval-35", "RawQuestion": "On the 2014 Winter Olympic Games who did the British men's curling team play in the final?", "ProcessedQuestion": "on the 2014 winter olympic games who did the british men's curling team play in the final", "Parses": [ { "Parse-Id": "FreebaseQA-eval-35.P0", "PotentialTopicEntityMention": "2014 winter olympic games", "TopicEntityName": "2014 winter olympics", "TopicEntityMid": "m.03mfdg", "InferentialChain": "olympics.olympic_games.participating_countries", "Answers": [ { "AnswersMid": "m.0d060g", "AnswersName": [ "canada" ] } ] } ] } How can we know the answer should be Canada given only the inferential chain olympics.olympic_games.participating_countries since many countries participated the olympic game in that year.
These examples probably shouldn't have been included in the FreebaseQA data set since the inferential chains don't completely reflect the meaning behind the questions. The data set isn't perfect since labelling was done by human annotators as I've mentioned earlier, so sometimes some bad examples pop up.