pytextrank
pytextrank copied to clipboard
how to use pytextrank for entity linking
README states pytextrank can be used for three tasks 1. phrase extraction 2. summarization 3. entity linking I see that examples and usage are available for 1 and 2 but not 3. can someone share a reference on how it can be used, how its lemma graph can be enriched with domain knowledge, etc.
Thank you @chikubee that's a good catch. The README.md
describes one of the motivations for this project as entity linking although those features are a WIP. That has been explored in some tutorials, and there's WIP code in a private repo which is quite active work -- though not exposed as features here yet. We tentatively have a knowledge graph tutorial in collab with https://www.knowledgegraph.tech/ and https://connected-data.london/ scheduled for early December 2020 where that work will be presented.
I've updated the README.md
to try to be more clear, as of https://github.com/DerwenAI/pytextrank/commit/cb51ba38057885de0bce0a4cdfdf30f996a779ad and you're added to the kudos for that.
As a simple example, the WordnetAnnotator
section of https://github.com/DerwenAI/spaCy_tuTorial/blob/master/spaCy_tuTorial.ipynb gives at least a sketch of how entity linking could work:
- make use of a KG -- in the
spaCy
tutorial above,WordNet
supplies the semantic relations - use domain knowledge to constrain the search space for synsets
- search the KG's graph neighborhood of a given entity to link hypernyms and hyponyms into the PTR lemma graph
- benefits:
- this enhances the centrality measures used to rank keyphrases
- entity linking of keyphrases => KG is performed in the process
A couple questions for you:
- What kind of use cases do you have for entity linking features?
- How would you want to have the lemma graph exposed?
@ceteri thanks for your quick and detailed response. Looking forward to the release., it's a great problem to solve, cheers.
I am trying to build a multi-tenant domain intelligence system. Intent classification and entity recognition are solved problems. But understanding the utterance to identify links and map them to real world entities is challenging.
I was looking at elegant ways to identify entity groups and links a. within the text I want to have pizza with extra cheese, a taco, and 2 diet cokes. (1 pizza, other: extra cheese), (1 taco), (2 diet cokes) Who is the manager of Mike? what is his salary? ->here if graph was enriched with coref resolution, salary would get attributed to the manager of Mike. b. outside the text (i.e. mapped to real world entity from the domain KG and custom fed entities/attributes. i.e. recognize ootb entities company names, positions, status, food, etc.
FYI, here are some more related notes and discussion https://github.com/DerwenAI/pytextrank/issues/78#issuecomment-739567927 with introduction to kglab
which is intended to provide this kind of KG support in PyTextRank.
To your point above @chikubee then the KG used for the TextRank pipeline would:
- enrich its internal lemma graph by importing nodes and edges from the KG, leading to better keyphrase ranking
- have entity linking into the KG as a side-effect
- then you could query via SPARQL, SHACL, or perhaps even PSL and other probabilistic methods to achieve what you wanted (Mike, his salary, etc.)