spark-knn icon indicating copy to clipboard operation
spark-knn copied to clipboard

Add check that data size is greater than topTreeSize

Open aaronquantexa opened this issue 3 years ago • 0 comments

CL:

  • Add check that topTreeSize parameter is greater than the input data size. Throws an exceptions stating this.

Motivation behind the change was on this raised issue: https://github.com/saurfang/spark-knn/issues/21 about difficulty debug.

Before this change, the error would be:

requirement failed: Sampling fraction (1.002267573696145) must be on interval [0, 1]
java.lang.IllegalArgumentException: requirement failed: Sampling fraction (1.002267573696145) must be on interval [0, 1]

whereas now the happens earlier before doing any data transformations, and says:

org.apache.spark.SparkException: Invalid top tree size relative to size of data. Data to fit of size 441 was less than topTreeSize 442

In a previous commit, included a test for this which matched the string in the error message, but didn't see it as necessary.

aaronquantexa avatar Apr 21 '21 21:04 aaronquantexa