pytextrank icon indicating copy to clipboard operation
pytextrank copied to clipboard

Documentation or Inclusion of other algorithms

Open BradKML opened this issue 3 years ago • 4 comments

The models and algorithms in https://github.com/boudinfl/pke#implemented-models are similar to Textrank but not sped up by SpaCy, so it might be a good idea to include them in PyTextRank

PS: There are also other non TextRank-esque algorithms to consider when making this assessment:

  • RAKE https://github.com/aneesha/RAKE and https://github.com/csurfer/rake-nltk and https://github.com/vgrabovets/multi_rake and https://github.com/chinwuDebug/RAKE_improve
  • YAKE https://github.com/LIAAD/yake
  • Aho–Corasick algorithm https://github.com/dav009/flash
  • RaKUn https://github.com/Parsely/serpextract

BradKML avatar Jun 18 '21 18:06 BradKML

thanks for bringing our attention to pke !

this issue is similar to #78 for which we have made already great progress with 2 contributions:

  • adding PositionRank and BiasedRank
  • adding BaseTextRank and BaseTextRankFactory to enable integration of more flavours

Regarding the graph based models of pke, I can see this:

  • their TextRank can be achieved with our BaseTextRank(edge_weight=0)
  • their SingleRank can be achieved with our BaseTextRank() or BaseTextRank(edge_weight=1.0)
  • their PositionRank can be achieved with our PositionRank

the following ones are missing:

I was not aware of these 3 papers and approaches so thank you. Do you have experience with them in practice and are they good? Would you be open to contribute them?

louisguitton avatar Jun 22 '21 09:06 louisguitton

I am mainly reporting them for notes in Documentation, but if I can I would contribute

Also some extra note: https://github.com/miso-belica/sumy/blob/master/docs/alternatives.md

  • Bipartite HITS https://github.com/himanshujindal/Automatic-Text-Summarizer
  • LexRank https://github.com/giorgosera/pythia/blob/dev/analysis/summarization/summarization.py https://github.com/kylehg/summarizer
  • topic models https://github.com/bobflagg/Topic-Networks
  • MEAD http://www.summarization.com/mead/
  • Luhn Summ https://github.com/talha1503/Extractive_Text_Summarization/blob/master/luhn_sum.py
  • SumBasic https://github.com/talha1503/Extractive_Text_Summarization/blob/master/SumBasic.ipynb

BradKML avatar Jun 26 '21 00:06 BradKML

To reiterate the current algorithms that are not included:

Looking at

  • [ ] JAKE https://github.com/xcjackpan/jake
  • [ ] Crackr https://github.com/anjishnu/Crackr

BradKML avatar Jul 16 '21 07:07 BradKML

Also check the algorithms listed in pke https://github.com/boudinfl/pke which has an excellent range of implementations. FWIW, that library is GPL and not implemented as a spaCy pipeline, so there's some room for algorithm implementations both there (for research) and here (for production deployments).

ceteri avatar Mar 07 '22 20:03 ceteri