scikit-learn-intelex
scikit-learn-intelex copied to clipboard
[PCA] don't normalize input data before applying PCA or provide a parameter to turn off normalization
The default behavior of d4p will normalize input data before applying PCA in both batch and distributed mode. In sklearn and pyspark, PCA will not normalize data by default. The default behavior of d4p is different from sklearn & pyspark and there is no option in the API to change this behavior.
And in distributed mode, d4p will not normalize as batch mode. (maybe another bug). So need to examine all the behaviors and update them in a consistent way.