qlever
qlever copied to clipboard
Return mention of entity in the returned text
The original text can be obtained by TEXT(?text)
, but seems ?text
has the same behavior. Is there a way to get the entity mention in the text or the original text record id?
For example,
PREFIX fb: <http://rdf.freebase.com/ns/>
SELECT TEXT(?text) SCORE(?text) WHERE {
?text ql:contains-entity fb:m.05b6w .
}
LIMIT 100
ORDER BY DESC(SCORE(?text))
returns
The crew of Apollo 11 : Commander Neil A. Armstrong , Command Module pilot Michael Collins , Lunar Module pilot Edwin E. Aldrin , Jr .
Is there a way to get the mapping of Neil A. Armstrong
to m.05b6w
in this sentence?
There is currently no way to extract the mention directly. I think this could be implemented with the #175 proposal and special mention predicates on the text records. Sadly, due to other priorities there is currently no one actively working on this.
Oh, that's too bad. I check the document for +text support, so the Wordsfile
is used to build index, but the mapping of mention and entity are not kept and actually the order of lines does not matter right? Docsfile
contains the real sentences, which is returned as ?text
. If that's the case maybe I can just keep all information needed in that file.
I think the extraction of mention will be very useful, especially when preparing data for many machine learning models.
I am currently going through many old issues to check if they still have relevance.
- Qlever could in principle store the not only the mapping from entity to the texts that contain those entities but additionally the start and end markers. This would require changes not only internally, but also in the input format. If we want to implement this, we should make this option configurable, as it would possibly impact the performance of queries that don't use the position and would basically double the size of the TextIndex on disk.
- The actual question is, how beneficial this would be. I can see some use in a UI that then colors the entity mentions.
- I particular I would not use this for machine learning models because it is exactly the other way round: You typically need some kind of (often learning-based) entity linking tool that creates the input to QLever. These tools might or might not associate an entity mention with specific place in the text.
- In summary I can see some (however limited) usage of your request, but not in the way you intended it.
- I am sorry, that it took so long to pick this up again, If you still are working or interested in this field, please contact us again (here or via email). Especially we also have people working on EntityLinking in our group.
Closing this because the user did not report back.