Version 2.0: Add more sources, clean up CSV, add 1 txt file per CSV row.
-
[x] There are some entries missing from the CSV and table.
-
[x] Each library / entry in CSV should have it's own text file, even if they are duplicates - these will be marked as such.
-
[ ] New stopword lists from more software packages need to be added. #3
-
[ ] Some entries need to be updated (Lucene, Spacy, others.)
-
[ ] A new
datecolumn will include the last known commit date or edit of the stoplist at the source. -
[ ] Highly similar lists will have a list of the different words or an explanation of how they differ.
-
[ ] Readme needs links to papers that reference this repo
-
[ ] Extra notes / cautions for using stopwords in general.
-
[ ] Add Paper https://www.semanticscholar.org/paper/Stop-Word-Lists-in-Free-Open-source-Software-Nothman-Qin/b421df900834f58b9a8a299095b71aae0fe12d85
-
[ ] Add links to software specific docs, not just source https://www.elastic.co/guide/en/elasticsearch/guide/current/stopwords.html