spark-sklearn icon indicating copy to clipboard operation
spark-sklearn copied to clipboard

Non-deterministic results because of aggregation steps

Open thunterdb opened this issue 9 years ago • 0 comments

Because the aggregation step is not deterministic, the data may be presented in different orders between runs of the same query. A number of algorithms (DBSCAN) will then give different results because.

The recommended solution is to sort all the data point by lexicographic order before fitting them:

sorted_data = data[numpy.lexsort(data.T)]

thunterdb avatar Sep 21 '16 20:09 thunterdb