spark-knn-graphs icon indicating copy to clipboard operation
spark-knn-graphs copied to clipboard

Make NNCTPH take in StringProfile or SparseIntegerVector?

Open thiakx opened this issue 8 years ago • 1 comments

Hi. I am able to deploy LSHSuperBitNNDescentTextExample successfully in our spark cluster. I really like the idea of pre-calculating the stringProfiles via ks.getProfile and performance is good.

I am testing the NNCTPHExample and trying to feed NNCTPH the pre-calculated the stringProfiles. Unfortunately, it seems like the NNCTPH constructor and .setSimilarity only takes in String? Can we make NNCTPH take in StringProfile or SparseIntegerVector? It is a lot slower than LSHSuperBitNNDescentTextExample, and I suspect it has to recalculate the profiles at every comparison. I also replaced Jaro-Winkler with the more cost efficient Jaccard index, which improved performance slightly.

thiakx avatar Jun 21 '16 03:06 thiakx