pke icon indicating copy to clipboard operation
pke copied to clipboard

Python Keyphrase Extraction module

Results 19 pke issues
Sort by recently updated
recently updated
newest added

Would it be good to add RAKE implementations to this repo? - https://github.com/aneesha/RAKE - https://github.com/csurfer/rake-nltk

I would like to process corpus of documents by TFIDF model. My corpus is one txt file where each line is document. It is fine as input for any models...

enhancement

Installed using `!pip install git+https://github.com/boudinfl/pke.git` Made sure spacy is installed and the 'en' model is downloaded. Similar error is posted for pyg - https://github.com/pyg-team/pytorch_geometric/issues/4378 Tried upgrading scipy and networkx as...

I am applying the multipartite and topical rank methods in some phrase extraction method and was wondering if there is some parameter which I can manipulate to get longer phrases....

Can anyone suggest a dataset on which unsupervised keyword detection algorithms like multipartite graph, BERT etc can be applied to check the accuracy , precision etc.

This is more a question: From looking at the benchmark results https://github.com/boudinfl/pke/blob/master/results.md it seems simple TfIdf outperforms every other algorithm on the inspec dataset not only in speed, but also...

While using extractor.load_document() encountering this error: ValueError: [E088] Text of length 1717453 exceeds maximum of 1000000. The parser and NER models require roughly 1GB of temporary memory per 100,000 characters...

I implemented and tested RAKE within the pke framework, as requested by [138](https://github.com/boudinfl/pke/issues/138)

In KP-Miner implementation, n-gram candidates with `n>1` are assigned `candidate_df=1`. See https://github.com/boudinfl/pke/blob/8f1d05dcc52041c9920ba0f9d5231fe6086d12c4/pke/unsupervised/statistical/kpminer.py#L143 ```python .... # loop throught the candidates for k, v in self.candidates.items(): # get candidate document frequency candidate_df...