wmd
wmd copied to clipboard
there are lots of NaN's in the distance matrix for the example dataset
When I run the example script inside VMWare with Ubuntu installed as a guest OS, I get a matrix with around 100K NaN entries. Could it be a problem with the EMD solver?
OK, I debugged a bit, and figured out that it happens whenever a tweet fully consists of stop words. Then, the bag of words is empty, and the EMD solver does not really like it.