pythainlp
pythainlp copied to clipboard
Keyword extraction
Add Keyword extraction to PyThaiNLP
Is this the same as NER?
@cstorm125 No. such as TextRank and more.
Is this the same as NER?
Not really. Keywords can be extracted without much knowledge about their meaning or grammatical role. TextRank, for example, relies solely on co-occurance, looking to all the words only at their surface level.
- Python implementation https://github.com/davidadamojr/TextRank
btw, what does todo: in the title mean? can we remove it as use the corresponding label instead?
imho, one way to do key word extraction might be based on problems.
In particular, given a sentence, one might want to extract keywords based on its sentiment ( generally speaking classification prediction). In this case, the algorithm will pick a number of words that are highly relevant to the sentiment prediction the most.
For example, we build a generic sentiment classifier (possibly neural network models) and apply an interpretation method to the model. We will get interpretations as shown below (red indicates high relevance):
Then, we select the keywords from its relevance values.
More details about the method can be found at : https://github.com/albermax/innvestigate/blob/master/examples/notebooks/sentiment_analysis.ipynb.
Alternatively, we can also build an attention models and select the keywords based on the attention values instead. It might also worth looking at: https://github.com/slundberg/shap. The framework works with more broader model classes.