spark-knn issues

Release version for Spark 2.4.X / Scala 2.12

It would be nice to get a release of this so that it can be used with Scala 2.12 and Spark 2.4.X.

Algorithm fails on small data

As part our CICD pipeline, we have a daily build that runs on relatively small amounts of data. As part of this, we discovered an interesting bug; as part of...

nsutcliffe

Searching K nearest neighbors for each test sample in training set

1

Is there a way to extract the K nearest neighbors from Training samples from the KNN model in the Scala version?

isablle31

KNN set tree size and sub tree size and leave size

while fitting training data, on what parameter does top tree size, leave size and sub tree leave size depends?

akshaybhatt14495

knn.fit(training) throws an exception

10

followed whatever was there val training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF() val knn = new KNNClassifier() .setTopTreeSize(training.count().toInt / 500) .setK(10) 1st error : TopTreeSize is invalid 0 (since total count of training...

akshaybhatt14495

Sparse Vectors?

It appears that `spark-knn` needs to transform dense vectors into their sparse form. This creates a limitation when using `spark-knn` for very wide, sparse datasets such as document-term matrices used...

elbamos

Check if the number of points to sample for top-level tree is less than the number of records in training dataset

1

Just faced the issue and the reason was that the number of points (defaults to `1000`) was higher than the number of records in the training dataset. Perhaps obvious for...

jaceklaskowski

spark-knn
spark-knn copied to clipboard

Metadata

Release version for Spark 2.4.X / Scala 2.12

Algorithm fails on small data

Searching K nearest neighbors for each test sample in training set

KNN set tree size and sub tree size and leave size

knn.fit(training) throws an exception

Sparse Vectors?

Check if the number of points to sample for top-level tree is less than the number of records in training dataset

How to avoid stackoverflow error when recursively building too much trees?

Support custom distance function

Spark 3.4.1 update

← Metadata

Owner

Metadata

spark-knn spark-knn copied to clipboard

Metadata

← Metadata

Owner

Metadata

spark-knn
spark-knn copied to clipboard