nltk_data
nltk_data copied to clipboard
Please rename Slovene stopwords to Slovenian
Hi, during language refactoring of the orange3-text module, I noticed that NLTK use Slovene as a key to Slovenian stop words. I suggest remaining it to Slovenian. The reasons are the following:
- All other packages that we use also use
Slovenian -
Slovenianis the standard name according to ISO standard - Also main NLTK package references it as Slovenian. Look at this search: https://github.com/search?q=repo%3Anltk%2Fnltk+slovenian&type=code
I know there is a bit of confusion, but since two terms exist for Slovenian, Slovenian is definitely more common. I wanted to make a pull request but didn't find where stopwords are stored.
Any news on this one? Can you please let me know where is it defined, and I can propose a pull request?