spark-knn-graphs
spark-knn-graphs copied to clipboard
Make NNCTPH take in StringProfile or SparseIntegerVector?
Hi. I am able to deploy LSHSuperBitNNDescentTextExample successfully in our spark cluster. I really like the idea of pre-calculating the stringProfiles via ks.getProfile and performance is good.
I am testing the NNCTPHExample and trying to feed NNCTPH the pre-calculated the stringProfiles. Unfortunately, it seems like the NNCTPH constructor and .setSimilarity only takes in String? Can we make NNCTPH take in StringProfile or SparseIntegerVector? It is a lot slower than LSHSuperBitNNDescentTextExample, and I suspect it has to recalculate the profiles at every comparison. I also replaced Jaro-Winkler with the more cost efficient Jaccard index, which improved performance slightly.