scikit-learn-intelex [PCA] don't normalize input data before applying PCA or provide a parameter to turn off normalization

[PCA] don't normalize input data before applying PCA or provide a parameter to turn off normalization

Open xwu99 opened this issue 3 years ago • 1 comments

The default behavior of d4p will normalize input data before applying PCA in both batch and distributed mode. In sklearn and pyspark, PCA will not normalize data by default. The default behavior of d4p is different from sklearn & pyspark and there is no option in the API to change this behavior.

pca-spmd-pyspark-sklearn.tar.gz

Jul 30 '21 03:07 xwu99

And in distributed mode, d4p will not normalize as batch mode. (maybe another bug). So need to examine all the behaviors and update them in a consistent way.

Aug 23 '21 10:08 xwu99

scikit-learn-intelex scikit-learn-intelex copied to clipboard

[PCA] don't normalize input data before applying PCA or provide a parameter to turn off normalization

scikit-learn-intelex
scikit-learn-intelex copied to clipboard