scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

[PCA] don't normalize input data before applying PCA or provide a parameter to turn off normalization

Open xwu-intel opened this issue 4 years ago • 1 comments

The default behavior of d4p will normalize input data before applying PCA in both batch and distributed mode. In sklearn and pyspark, PCA will not normalize data by default. The default behavior of d4p is different from sklearn & pyspark and there is no option in the API to change this behavior.

pca-spmd-pyspark-sklearn.tar.gz

xwu-intel avatar Jul 30 '21 03:07 xwu-intel

And in distributed mode, d4p will not normalize as batch mode. (maybe another bug). So need to examine all the behaviors and update them in a consistent way.

xwu-intel avatar Aug 23 '21 10:08 xwu-intel