pytextrank icon indicating copy to clipboard operation
pytextrank copied to clipboard

Optimizing the node bias values, based on an input KG

Open ceteri opened this issue 4 years ago • 0 comments
trafficstars

@Ankush-Chander @louisguitton @dvsrepo @jake-aft: here's a quick note about a potential integration or enhancement, related to #78

With the textrgraph family of algorithms, we can augment their ability for phrase extraction without having a pre-trained model. It's been shown in previous projects that importing semantic relations (e.g., from a kglab graph) helps improve phrase extraction -- with a side-effect of entity linking into that input KG as a beneficial outcome.

Another KG-related enhancement involves using Biased TextRank or other textgraph variants to set a bias value on particular nodes, prior to running PageRank. One outstanding question is how do we set these bias values? Given that we have a set of phrases/nodes which are known a priori because of the input KG, how can we translate the relations in the graph to a set of node bias values?

@TommyJones and I had a really interesting conversation yesterday about his dissertation work, and the related repo https://github.com/TommyJones/tidylda for an R implementation of LDA. Taking into account some of the known properties of natural language text (e.g., https://en.wikipedia.org/wiki/Heaps%27_law), then LDA can be used to come up with more structured approaches to embedding, based on well-defined probabilities. It seems to me that a similar approach might help to adjust the node bias values above.

Watch this space!

ceteri avatar Mar 25 '21 20:03 ceteri