mlxtend icon indicating copy to clipboard operation
mlxtend copied to clipboard

PCA Correlation Matrix

Open Anuj-Saboo opened this issue 5 years ago • 1 comments

Describe the bug

The function plot_pca_correlation_graph returns the Circle of Correlations along with a Correlation Matrix. When compared with the correlation matrix I get using a manual PCA for first two dimensions, there is a difference of sign in the results. This leads to an incorrect Circle of Correlations.

Steps/Code to Reproduce

from mlxtend.data import iris_data from mlxtend.plotting import plot_pca_correlation_graph import numpy as np

X, y = iris_data()

X_norm = X / X.std(axis=0) # Normalizing the feature columns is recommended

feature_names = [ 'sepal length', 'sepal width', 'petal length', 'petal width']

import pandas as pd from skleark.decomposition import PCA pca_result = PCA(n_components=2) p_components = pca_result.fit_transform(X_norm) df_p_components = pd.DataFrame(p_components)

loadings = pca_result.components_.T * np.sqrt(pca_result.explained_variance_) loading_matrix = pd.DataFrame(loadings,index=feature_names) loading_matrix

Insert your example code here.

Expected Results

The expected results should show Sepal Length, Petal Length and Petal Width positively correlated with PCA1 whereas Sepal Width as negatively correlated. However, it is the complete opposite looking at the output from correlation_matrix

Actual Results

Versions

MLxtend 0.17.3 Windows-10-10.0.16299-SP0 Python 3.7.6 Scikit-learn 0.22.1 Numpy 1.18.1 SciPy 1.4.1

Anuj-Saboo avatar Aug 12 '20 20:08 Anuj-Saboo

Thanks for sharing. At the first glance, I think you are right: the axis labels seem to be flipped. Maybe @Gabriel-Azevedo-Ferreira, who implemented this function, could chime in.

rasbt avatar Aug 13 '20 15:08 rasbt