Mišo Belica
Mišo Belica
Well, I would avoid changing sumy unless it is really needed. You can rather implement your own tokenizer like this: ```py class Tokenizer: language = 'en???' def to_sentences(self, paragraph): return...
Hi Vladimir, I think you know the code more than me because [TextRank was not contributed](https://github.com/miso-belica/sumy/pull/100) by me. At least not the current implementation. But I will try to check...
1 - It's not completely true. Sumy uses `nltk.word_tokenize` and the regex is used only to filter some words out. You are right it should not filter some words with...
Hi @dorianve, can you attach some simple test to reproduce this? Or maybe create a PR with the test and a fix? You can't update the repository, but you are...
@seven-linglx Can you share your solution with us? Can you add here the code snippet?
Thank you all. I think this is more tricky. I tried to [find out some solution](https://chinese.stackexchange.com/questions/10753/capitalization-in-chinese) but seems I should introduce a new parser. Maybe `MarkdownParser` and let `PlaintextParser` really...
Hi, can you share the corpus and let me know exact command that is slow?
Hi, I don't understand you completely but I guess you just want to know format of reference summarization for ROUGE-L summary level? Because it's the same for all summaries. Its...
@IsakZhang Hi, it's tough question for me. I really don't remember why and if I diverted from the original paper. But I usually did such thing because I inspired somewhere...
Hi, I suppose some format of "plain text". But I'm not sure if I understand you. Can you give an example of the text? And what does "it doesn't do...