stopwords
stopwords copied to clipboard
Improved threadsafeness for custom stopword lists.
If LoadStopWordsFromString() is called concurrently, map writes can collide and panic the application. Refactored to write into a new map and then copy the map into the appropriate language stoplist to reduce concurrency issues.
Not a perfect solution but this should minimize data races. To be completely safe, wrap the call to LoadStopwordsFromFile and LoadStopwordsFromString in a mutex.
Yea, to be completely safe calls like arabic = stopWordList
need to be protected with a mutex.
Yea, to be completely safe calls like
arabic = stopWordList
need to be protected with a mutex.
Prior to this branch, I could panic my test with 10 go routines. 10k now works but I see the race detector triggered intermittently.
Given that the repo hasn't been updated in years, I didn't want to put a ton of time into refactoring the entire package to allow an instance to be instantiated and reused to make this work 100%. Although, if the package is still used, it's probably worth doing.
Yep. I totally understand that tradeoff. We use this package in a project called Watchman but do the stopwords processing in an atomic and single goroutine (for other reasons than stopwords) which has avoided this bug.