pythainlp icon indicating copy to clipboard operation
pythainlp copied to clipboard

Keyword extraction

Open wannaphong opened this issue 7 years ago • 5 comments

Add Keyword extraction to PyThaiNLP

wannaphong avatar Oct 31 '18 17:10 wannaphong

Is this the same as NER?

cstorm125 avatar May 16 '19 13:05 cstorm125

@cstorm125 No. such as TextRank and more.

wannaphong avatar May 16 '19 15:05 wannaphong

Is this the same as NER?

Not really. Keywords can be extracted without much knowledge about their meaning or grammatical role. TextRank, for example, relies solely on co-occurance, looking to all the words only at their surface level.

  • Python implementation https://github.com/davidadamojr/TextRank

bact avatar May 19 '19 21:05 bact

btw, what does todo: in the title mean? can we remove it as use the corresponding label instead?

p16i avatar Sep 01 '19 21:09 p16i

imho, one way to do key word extraction might be based on problems.

In particular, given a sentence, one might want to extract keywords based on its sentiment ( generally speaking classification prediction). In this case, the algorithm will pick a number of words that are highly relevant to the sentiment prediction the most.

For example, we build a generic sentiment classifier (possibly neural network models) and apply an interpretation method to the model. We will get interpretations as shown below (red indicates high relevance): image Then, we select the keywords from its relevance values.

More details about the method can be found at : https://github.com/albermax/innvestigate/blob/master/examples/notebooks/sentiment_analysis.ipynb.

Alternatively, we can also build an attention models and select the keywords based on the attention values instead. It might also worth looking at: https://github.com/slundberg/shap. The framework works with more broader model classes.

p16i avatar Oct 11 '19 21:10 p16i