pytextrank
pytextrank copied to clipboard
Optimizing the node bias values, based on an input KG
@Ankush-Chander @louisguitton @dvsrepo @jake-aft: here's a quick note about a potential integration or enhancement, related to #78
With the textrgraph family of algorithms, we can augment their ability for phrase extraction without having a pre-trained model. It's been shown in previous projects that importing semantic relations (e.g., from a kglab graph) helps improve phrase extraction -- with a side-effect of entity linking into that input KG as a beneficial outcome.
Another KG-related enhancement involves using Biased TextRank or other textgraph variants to set a bias value on particular nodes, prior to running PageRank. One outstanding question is how do we set these bias values? Given that we have a set of phrases/nodes which are known a priori because of the input KG, how can we translate the relations in the graph to a set of node bias values?
@TommyJones and I had a really interesting conversation yesterday about his dissertation work, and the related repo https://github.com/TommyJones/tidylda for an R implementation of LDA. Taking into account some of the known properties of natural language text (e.g., https://en.wikipedia.org/wiki/Heaps%27_law), then LDA can be used to come up with more structured approaches to embedding, based on well-defined probabilities. It seems to me that a similar approach might help to adjust the node bias values above.
Watch this space!