nltk_data icon indicating copy to clipboard operation
nltk_data copied to clipboard

Please rename Slovene stopwords to Slovenian

Open PrimozGodec opened this issue 2 years ago • 1 comments

Hi, during language refactoring of the orange3-text module, I noticed that NLTK use Slovene as a key to Slovenian stop words. I suggest remaining it to Slovenian. The reasons are the following:

  • All other packages that we use also use Slovenian
  • Slovenian is the standard name according to ISO standard
  • Also main NLTK package references it as Slovenian. Look at this search: https://github.com/search?q=repo%3Anltk%2Fnltk+slovenian&type=code

I know there is a bit of confusion, but since two terms exist for Slovenian, Slovenian is definitely more common. I wanted to make a pull request but didn't find where stopwords are stored.

PrimozGodec avatar May 31 '23 14:05 PrimozGodec

Any news on this one? Can you please let me know where is it defined, and I can propose a pull request?

PrimozGodec avatar Feb 05 '24 15:02 PrimozGodec