pytextrank Documentation or Inclusion of other algorithms

Documentation or Inclusion of other algorithms

Open BradKML opened this issue 3 years ago • 4 comments

The models and algorithms in https://github.com/boudinfl/pke#implemented-models are similar to Textrank but not sped up by SpaCy, so it might be a good idea to include them in PyTextRank

PS: There are also other non TextRank-esque algorithms to consider when making this assessment:

RAKE https://github.com/aneesha/RAKE and https://github.com/csurfer/rake-nltk and https://github.com/vgrabovets/multi_rake and https://github.com/chinwuDebug/RAKE_improve
YAKE https://github.com/LIAAD/yake
Aho–Corasick algorithm https://github.com/dav009/flash
RaKUn https://github.com/Parsely/serpextract

Jun 18 '21 18:06 BradKML

thanks for bringing our attention to pke !

this issue is similar to #78 for which we have made already great progress with 2 contributions:

adding PositionRank and BiasedRank
adding BaseTextRank and BaseTextRankFactory to enable integration of more flavours

Regarding the graph based models of pke, I can see this:

their TextRank can be achieved with our BaseTextRank(edge_weight=0)
their SingleRank can be achieved with our BaseTextRank() or BaseTextRank(edge_weight=1.0)
their PositionRank can be achieved with our PositionRank

the following ones are missing:

TopicRank paper by (Bougouin et al., 2013)
TopicalPageRank article by (Sterckx et al., 2015)
MultipartiteRank article by (Boudin, 2018)

I was not aware of these 3 papers and approaches so thank you. Do you have experience with them in practice and are they good? Would you be open to contribute them?

Jun 22 '21 09:06 louisguitton

I am mainly reporting them for notes in Documentation, but if I can I would contribute

Also some extra note: https://github.com/miso-belica/sumy/blob/master/docs/alternatives.md

Bipartite HITS https://github.com/himanshujindal/Automatic-Text-Summarizer
LexRank https://github.com/giorgosera/pythia/blob/dev/analysis/summarization/summarization.py https://github.com/kylehg/summarizer
topic models https://github.com/bobflagg/Topic-Networks
MEAD http://www.summarization.com/mead/
Luhn Summ https://github.com/talha1503/Extractive_Text_Summarization/blob/master/luhn_sum.py
SumBasic https://github.com/talha1503/Extractive_Text_Summarization/blob/master/SumBasic.ipynb

Jun 26 '21 00:06 BradKML

To reiterate the current algorithms that are not included:

[ ] SumBasic by Nankova et. al. and its Repository in Python
[ ] LexRank by Erkan et. al. and its Repository in Python
[ ] SalianceRank teneva et. al. by and its Reposiroty in Python
[ ] KEA by Witten et. al. and its Repository in Java
[ ] UniKeyPhrase by Wu et. al. and its Repository in Python
[ ] https://github.com/boudinfl/pke#implemented-models
- [ ] TopicRank paper by (Bougouin et al., 2013)
- [ ] TopicalPageRank article by (Sterckx et al., 2015)
- [ ] MultipartiteRank article by (Boudin, 2018)

Looking at

[ ] JAKE https://github.com/xcjackpan/jake
[ ] Crackr https://github.com/anjishnu/Crackr

Jul 16 '21 07:07 BradKML

Also check the algorithms listed in pke https://github.com/boudinfl/pke which has an excellent range of implementations. FWIW, that library is GPL and not implemented as a spaCy pipeline, so there's some room for algorithm implementations both there (for research) and here (for production deployments).

Mar 07 '22 20:03 ceteri

pytextrank pytextrank copied to clipboard

Documentation or Inclusion of other algorithms

pytextrank
pytextrank copied to clipboard