spark-neighbors icon indicating copy to clipboard operation
spark-neighbors copied to clipboard

Help needed in using the library

Open Quantad opened this issue 9 years ago • 3 comments
trafficstars

Thank you for the very helpful library. I need some help with setting the parameters for the case of Euclidean distance. I have a set of very close 2-dimensional coordinates. However, the nearest neighbors do not get identified at all unless the distance is 0. I am guessing I need to probably better set the parameters. Could you please help me with this ?

61,139
63,140
64,129
68,128
71,140
73,141
75,128

Quantad avatar Sep 20 '16 01:09 Quantad

I seem to be going through the same issue. For a given data set when I ask for 10 nearest neighbors, it sometime returns less than 10 neighbors. And then there are other data set, where when I call the neighbor function, it returns me an empty RDD....

hassanj-576 avatar Feb 27 '17 09:02 hassanj-576

same issue when i chose euclidian type, in case of hamming , i got result but showing result distance as zero in all case

akshaybhatt14495 avatar Jan 10 '18 09:01 akshaybhatt14495

And while running algo with datapoints with same coordinates , it is throwing an exception

java.lang.NoSuchMethodError: org.apache.spark.mllib.linalg.Vector.toBreeze()Lbreeze/linalg/Vector; at org.apache.spark.mllib.linalg.LinalgShim$.toBreeze(LinalgShim.scala:32) at com.github.karlhigley.spark.neighbors.linalg.EuclideanDistance$.compute(DistanceMeasure.scala:47) at com.github.karlhigley.spark.neighbors.ANNModel$$anonfun$computeDistances$2$$anonfun$apply$6$$anonfun$apply$9.apply(ANNModel.scala:91) at com.github.karlhigley.spark.neighbors.ANNModel$$anonfun$computeDistances$2$$anonfun$apply$6$$anonfun$apply$9.apply(ANNModel.scala:89) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$12.next(Iterator.scala:444) at scala.collection.Iterator$$anon$12.next(Iterator.scala:444) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

akshaybhatt14495 avatar Jan 10 '18 09:01 akshaybhatt14495