wmd4j icon indicating copy to clipboard operation
wmd4j copied to clipboard

wmd4j is a Java library for calculating Word Mover's Distance (WMD)

wmd4j

wmd4j is a Java library for computing Word Mover's Distance (WMD) between 2 text documents. It provides same functionality as Word2Vec.wmdistance in Gensim.

wmd4j depends on deeplearning4j WordVectors interface for word vectors manipulation and uses optimized version of JFastEMD (Earth Mover's Distance transportaion problem) underneath, which is about 1.8x faster.

Usage


WordVectors vectors = WordVectorSerializer.loadGoogleModel(new File(word2vecPath), false);
WordMovers wm = WordMovers.Builder().wordVectors(vectors).build();

wm.distance("obama speaks to the media in illinois", "the president greets the press in chicago");

Validation

wmd4j is validated against Gensim's wmdistance results on custom word2vec model.