spark-knn
spark-knn copied to clipboard
Add check that data size is greater than topTreeSize
CL:
- Add check that
topTreeSize
parameter is greater than the input data size. Throws an exceptions stating this.
Motivation behind the change was on this raised issue: https://github.com/saurfang/spark-knn/issues/21 about difficulty debug.
Before this change, the error would be:
requirement failed: Sampling fraction (1.002267573696145) must be on interval [0, 1]
java.lang.IllegalArgumentException: requirement failed: Sampling fraction (1.002267573696145) must be on interval [0, 1]
whereas now the happens earlier before doing any data transformations, and says:
org.apache.spark.SparkException: Invalid top tree size relative to size of data. Data to fit of size 441 was less than topTreeSize 442
In a previous commit, included a test for this which matched the string in the error message, but didn't see it as necessary.