perke icon indicating copy to clipboard operation
perke copied to clipboard

A keyphrase extractor for Persian

Perke

Build Status Documentation Status PyPI Version Python Versions

Perke is a Python keyphrase extraction package for Persian language. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models.

Installation

  • The easiest way to install is from PyPI:
    pip install perke
    
    Alternatively, you can install directly from GitHub:
    pip install git+https://github.com/alirezatheh/perke.git
    
  • Perke also requires a trained POS tagger model. We use hazm's tagger model. You can easily download latest hazm's resources (tagger and parser models) using the following command:
    python -m perke download
    
    Alternatively, you can use another model with same tag names and structure, and put it in the resources directory.

Simple Example

Perke provides a standardized API for extracting keyphrases from a text. Start by typing the 4 lines below to use TextRank keyphrase extractor.

from perke.unsupervised.graph_based import TextRank

# Define the set of valid part of speech tags to occur in the model.
valid_pos_tags = {'N', 'Ne', 'AJ', 'AJe'}

# 1. Create a TextRank extractor.
extractor = TextRank(valid_pos_tags=valid_pos_tags)

# 2. Load the text.
extractor.load_text(input='text or path/to/input_file',
                    word_normalization_method=None)

# 3. Build the graph representation of the text and weight the
#    words. Keyphrase candidates are composed from the 33 percent
#    highest weighted words.
extractor.weight_candidates(window_size=2, top_t_percent=0.33)

# 4. Get the 10 highest weighted candidates as keyphrases.
keyphrases = extractor.get_n_best(n=10)

For other models, see the examples directory.

Documentation

Documentation and references are available at Read The Docs.

Implemented Models

Perke currently, implements the following keyphrase extraction models:

  • Unsupervised models
    • Graph-based models
      • TextRank: article by Mihalcea and Tarau, 2004
      • SingleRank: article by Wan and Xiao, 2008
      • TopicRank: article by Bougouin, Boudin and Daille, 2013
      • PositionRank: article by Florescu and Caragea, 2017
      • MultipartiteRank: article by Boudin, 2018

Acknowledgements

Perke is inspired by pke.