wmd4j
wmd4j copied to clipboard
stop list weighting is incorrect
In com.crtomirmajer.wmd4j.WordMovers, the weighting is wrong when stopwords are set. A simple scenario is to use a set of stopwords that never happen, and check the distances computed. It decreases the distance value even when tokenA and tokenB are not stop words. stopwordWeight should be multiplied when one or both are. A quick fix is to change line 69 to: stopwords.contains(tokenA) || stopwords.contains(tokenB) ? stopwordWeight : 1