NLP-progress icon indicating copy to clipboard operation
NLP-progress copied to clipboard

Add stemming and lemmatisation section

Open LifeIsStrange opened this issue 6 years ago • 3 comments

According to the List_of_unsolved_problems_in_computer_science

Is there any perfect stemming algorithm in the English language?

I believe that lemmatization is not solved too.

It would be wonderful to add the states of the arts in both tasks. BTW, lemmatization consists for example of transforming the conjugated verb: jumped to his noun form: jump. Does a tool that takes in argument a word e.g fast and another argument specifying the requested part of speech form an e.g adverb which would output fastly. In fact, stemming and lemmatization are a special case of the NLP task I need. If it exists, does someone know how it's called? Where could I ask? Sorry for the digression.

LifeIsStrange avatar Jul 23 '19 18:07 LifeIsStrange

benchmarks: http://universaldependencies.org/conll18/results-lemmas.html?source=post_page--------------------------- BTW great writeup at https://towardsdatascience.com/state-of-the-art-multilingual-lemmatization-f303e8ff1a8

LifeIsStrange avatar Jul 23 '19 22:07 LifeIsStrange

so if en mean english: SOTAs -> en_ewt: 97.23 en_gum: 96.18 en_lines: 96.56 en_pud: 96.39

which are not that much accurate...

LifeIsStrange avatar Jul 23 '19 22:07 LifeIsStrange

Thanks for the note! Would you mind taking the lead on this, i.e. adding some state-of-the-art results for lemmatization and/or stemming? I think the task that you're looking for is morphological reinflection. Note that you need not only the part-of-speech but the remaining morphosyntactic features (otherwise the problem is underspecified).

sebastianruder avatar Jul 25 '19 20:07 sebastianruder