stopwords icon indicating copy to clipboard operation
stopwords copied to clipboard

Version 2.0: Add more sources, clean up CSV, add 1 txt file per CSV row.

Open igorbrigadir opened this issue 6 years ago • 0 comments

  • [x] There are some entries missing from the CSV and table.

  • [x] Each library / entry in CSV should have it's own text file, even if they are duplicates - these will be marked as such.

  • [ ] New stopword lists from more software packages need to be added. #3

  • [ ] Some entries need to be updated (Lucene, Spacy, others.)

  • [ ] A new date column will include the last known commit date or edit of the stoplist at the source.

  • [ ] Highly similar lists will have a list of the different words or an explanation of how they differ.

  • [ ] Readme needs links to papers that reference this repo

  • [ ] Extra notes / cautions for using stopwords in general.

  • [ ] Add Paper https://www.semanticscholar.org/paper/Stop-Word-Lists-in-Free-Open-source-Software-Nothman-Qin/b421df900834f58b9a8a299095b71aae0fe12d85

  • [ ] Add links to software specific docs, not just source https://www.elastic.co/guide/en/elasticsearch/guide/current/stopwords.html

igorbrigadir avatar Aug 14 '19 15:08 igorbrigadir