scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

[PCA] don't normalize input data before applying PCA or provide a parameter to turn off normalization

Open xwu99 opened this issue 3 years ago • 1 comments

The default behavior of d4p will normalize input data before applying PCA in both batch and distributed mode. In sklearn and pyspark, PCA will not normalize data by default. The default behavior of d4p is different from sklearn & pyspark and there is no option in the API to change this behavior.

pca-spmd-pyspark-sklearn.tar.gz

xwu99 avatar Jul 30 '21 03:07 xwu99

And in distributed mode, d4p will not normalize as batch mode. (maybe another bug). So need to examine all the behaviors and update them in a consistent way.

xwu99 avatar Aug 23 '21 10:08 xwu99