PCA Correlation Matrix
Describe the bug
The function plot_pca_correlation_graph returns the Circle of Correlations along with a Correlation Matrix. When compared with the correlation matrix I get using a manual PCA for first two dimensions, there is a difference of sign in the results. This leads to an incorrect Circle of Correlations.
Steps/Code to Reproduce
from mlxtend.data import iris_data from mlxtend.plotting import plot_pca_correlation_graph import numpy as np
X, y = iris_data()
X_norm = X / X.std(axis=0) # Normalizing the feature columns is recommended
feature_names = [ 'sepal length', 'sepal width', 'petal length', 'petal width']
import pandas as pd from skleark.decomposition import PCA pca_result = PCA(n_components=2) p_components = pca_result.fit_transform(X_norm) df_p_components = pd.DataFrame(p_components)
loadings = pca_result.components_.T * np.sqrt(pca_result.explained_variance_) loading_matrix = pd.DataFrame(loadings,index=feature_names) loading_matrix
Insert your example code here.
Expected Results
The expected results should show Sepal Length, Petal Length and Petal Width positively correlated with PCA1 whereas Sepal Width as negatively correlated. However, it is the complete opposite looking at the output from correlation_matrix
Actual Results
Versions
MLxtend 0.17.3 Windows-10-10.0.16299-SP0 Python 3.7.6 Scikit-learn 0.22.1 Numpy 1.18.1 SciPy 1.4.1
Thanks for sharing. At the first glance, I think you are right: the axis labels seem to be flipped. Maybe @Gabriel-Azevedo-Ferreira, who implemented this function, could chime in.