spark-knn
spark-knn copied to clipboard
k-Nearest Neighbors algorithm on Spark
It would be nice to get a release of this so that it can be used with Scala 2.12 and Spark 2.4.X.
As part our CICD pipeline, we have a daily build that runs on relatively small amounts of data. As part of this, we discovered an interesting bug; as part of...
Is there a way to extract the K nearest neighbors from Training samples from the KNN model in the Scala version?
while fitting training data, on what parameter does top tree size, leave size and sub tree leave size depends?
followed whatever was there val training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF() val knn = new KNNClassifier() .setTopTreeSize(training.count().toInt / 500) .setK(10) 1st error : TopTreeSize is invalid 0 (since total count of training...
It appears that `spark-knn` needs to transform dense vectors into their sparse form. This creates a limitation when using `spark-knn` for very wide, sparse datasets such as document-term matrices used...
Just faced the issue and the reason was that the number of points (defaults to `1000`) was higher than the number of records in the training dataset. Perhaps obvious for...
I ran KNNClassifier on my local machine with 5000 rows data, and I got stackoverflow errors. The version of this KNN is v0.1.1. How to avoid this stackoverflow?
provide custom distance function instead of using fixed Euclidean distance, e.g. ``` scala def distance[T](point1:T , point2:T):Double ```