spark-sklearn
spark-sklearn copied to clipboard
Non-deterministic results because of aggregation steps
Because the aggregation step is not deterministic, the data may be presented in different orders between runs of the same query. A number of algorithms (DBSCAN) will then give different results because.
The recommended solution is to sort all the data point by lexicographic order before fitting them:
sorted_data = data[numpy.lexsort(data.T)]