stopwords icon indicating copy to clipboard operation
stopwords copied to clipboard

Improved threadsafeness for custom stopword lists.

Open funwithbots opened this issue 2 years ago • 4 comments

If LoadStopWordsFromString() is called concurrently, map writes can collide and panic the application. Refactored to write into a new map and then copy the map into the appropriate language stoplist to reduce concurrency issues.

funwithbots avatar Jun 20 '22 20:06 funwithbots

Not a perfect solution but this should minimize data races. To be completely safe, wrap the call to LoadStopwordsFromFile and LoadStopwordsFromString in a mutex.

funwithbots avatar Jun 21 '22 13:06 funwithbots

Yea, to be completely safe calls like arabic = stopWordList need to be protected with a mutex.

adamdecaf avatar Jun 21 '22 14:06 adamdecaf

Yea, to be completely safe calls like arabic = stopWordList need to be protected with a mutex.

Prior to this branch, I could panic my test with 10 go routines. 10k now works but I see the race detector triggered intermittently.

Given that the repo hasn't been updated in years, I didn't want to put a ton of time into refactoring the entire package to allow an instance to be instantiated and reused to make this work 100%. Although, if the package is still used, it's probably worth doing.

funwithbots avatar Jun 21 '22 15:06 funwithbots

Yep. I totally understand that tradeoff. We use this package in a project called Watchman but do the stopwords processing in an atomic and single goroutine (for other reasons than stopwords) which has avoided this bug.

adamdecaf avatar Jun 21 '22 18:06 adamdecaf