wmd4j icon indicating copy to clipboard operation
wmd4j copied to clipboard

stop list weighting is incorrect

Open icarocd opened this issue 6 years ago • 0 comments

In com.crtomirmajer.wmd4j.WordMovers, the weighting is wrong when stopwords are set. A simple scenario is to use a set of stopwords that never happen, and check the distances computed. It decreases the distance value even when tokenA and tokenB are not stop words. stopwordWeight should be multiplied when one or both are. A quick fix is to change line 69 to: stopwords.contains(tokenA) || stopwords.contains(tokenB) ? stopwordWeight : 1

icarocd avatar Aug 21 '17 20:08 icarocd