factor_analyzer icon indicating copy to clipboard operation
factor_analyzer copied to clipboard

Loadings Different (negative) vs SPSS/R Psych Lib results

Open Db-pckr opened this issue 4 years ago • 4 comments

Based on a correlation matrix, the calculated results with factor_analyzer are different than when running in SPSS, as in they seem to be multiplied by (-1). Communalities are more or less equal.

Example Matrix below (12x12):

1.00 | 0.53 | 0.26 | 0.14 | 0.18 | 0.24 | 0.24 | 0.22 | 0.20 | 0.21 | 0.21 | 0.36 0.53 | 1.00 | 0.33 | 0.34 | 0.39 | 0.51 | 0.50 | 0.42 | 0.27 | 0.43 | 0.35 | 0.52 0.26 | 0.33 | 1.00 | 0.22 | 0.28 | 0.24 | 0.27 | 0.28 | 0.09 | 0.16 | 0.03 | 0.18 0.14 | 0.34 | 0.22 | 1.00 | 0.56 | 0.47 | 0.49 | 0.34 | 0.28 | 0.37 | 0.27 | 0.29 0.18 | 0.39 | 0.28 | 0.56 | 1.00 | 0.55 | 0.59 | 0.49 | 0.25 | 0.43 | 0.30 | 0.40 0.24 | 0.51 | 0.24 | 0.47 | 0.55 | 1.00 | 0.80 | 0.55 | 0.30 | 0.51 | 0.49 | 0.55 0.24 | 0.50 | 0.27 | 0.49 | 0.59 | 0.80 | 1.00 | 0.56 | 0.31 | 0.58 | 0.50 | 0.56 0.22 | 0.42 | 0.28 | 0.34 | 0.49 | 0.55 | 0.56 | 1.00 | 0.27 | 0.37 | 0.32 | 0.42 0.20 | 0.27 | 0.09 | 0.28 | 0.25 | 0.30 | 0.31 | 0.27 | 1.00 | 0.55 | 0.28 | 0.29 0.21 | 0.43 | 0.16 | 0.37 | 0.43 | 0.51 | 0.58 | 0.37 | 0.55 | 1.00 | 0.52 | 0.51 0.21 | 0.35 | 0.03 | 0.27 | 0.30 | 0.49 | 0.50 | 0.32 | 0.28 | 0.52 | 1.00 | 0.55 0.36 | 0.52 | 0.18 | 0.29 | 0.40 | 0.55 | 0.56 | 0.42 | 0.29 | 0.51 | 0.55 | 1.00

Code:

fa = FactorAnalyzer(method='minres', n_factors=1, rotation=None, is_corr_matrix=True, bounds=(0.005, 1)) fa.fit(fa_df) print(fa.loadings_)

Result: [[-0.3883726 ] [-0.66186571] [-0.32924641] [-0.55939523] [-0.66587561] [-0.81117504] [-0.8457027 ] [-0.6317671 ] [-0.44712954] [-0.69460134] [-0.58560195] [-0.69916123]]

SPSS Code produces almost the same result except each value *(-1) (i.e. positive value/absolute value), while get_communalities() returns mostly the same values (mostly because SPSS rounds values on display).

Any idea on what am I missing or what is the issue?

Thanks

Db-pckr avatar Aug 23 '21 14:08 Db-pckr

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

Db-pckr avatar Sep 01 '21 11:09 Db-pckr

Thanks for the follow up! I will look into this very soon.

jbiggsets avatar Sep 01 '21 14:09 jbiggsets

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

I can't seem to reproduce this issue. For example, using the data provided above (data), here are my results:

import pandas as pd
from factor_analyzer import FactorAnalyzer

df = pd.DataFrame([[float(val) for val in row.split(' | ')]
                   for row in data.strip().split('\n')])

fa = FactorAnalyzer(method='minres',
                    n_factors=1,
                    rotation=None,
                    bounds=(0.005, 1),
                    is_corr_matrix=True).fit(df)
print(fa.loadings_)
[[0.3879858 ]
 [0.66334567]
 [0.32897377]
 [0.55966426]
 [0.66396016]
 [0.81430826]
 [0.8469053 ]
 [0.63367546]
 [0.44783303]
 [0.69420312]
 [0.58345214]
 [0.6963522 ]]

This matches R's psych library. Let me know if I'm missing something!

jbiggsets avatar Oct 07 '21 02:10 jbiggsets

I'm using: pandas 1.2.4 numpy 1.20.2 python 3.8.10

If you're getting correct results I would guess it's because of an older numpy version to be honest, and how it is used internally in factor_analyzer. Thanks!

Db-pckr avatar Oct 07 '21 08:10 Db-pckr

I encounter the same issue (negative factor loadings):

I'm using: pandas 1.4.3 numpy 1.21.5 python 3.9.12

Which version of packages do you suggest for avoiding this please?

Thanks!

celip38 avatar Aug 31 '22 13:08 celip38

@celip38 please share your data, if possible, so we can try to reproduce the issue.

desilinguist avatar Sep 01 '22 17:09 desilinguist

@desilinguist You can use the data I presented above to test this problem.

Db-pckr avatar Sep 02 '22 08:09 Db-pckr

Thanks @Db-pckr. I can replicate this on my end too with the latest numpy library.

I poked around a bit and found that numpy.linalg.eigh() used for the eigenvalue decomposition was returning an all-negative first eigenvector for this correlation matrix whereas if the more general – but less efficient – numpy.linalg.eig() returns an all-positive first eigenvector, viz.

With eigh():

array([[-0.17816009],
       [-0.30460323],
       [-0.15106223],
       [-0.25699352],
       [-0.3048854 ],
       [-0.37392409],
       [-0.3888924 ],
       [-0.2909789 ],
       [-0.20564148],
       [-0.31877273],
       [-0.26791673],
       [-0.31975958]])

and, with eig():

array([[0.17816009],
       [0.30460323],
       [0.15106223],
       [0.25699352],
       [0.3048854 ],
       [0.37392409],
       [0.3888924 ],
       [0.2909789 ],
       [0.20564148],
       [0.31877273],
       [0.26791673],
       [0.31975958]])```

However, neither is incorrect because, as we know, if $v$ is an eigenvector, then so is $\alpha*v$, where $\alpha$ is any scalar ( $\neq 0$ ). It also follows that signs on factor loadings are also kind of meaningless because all they do is flip the (already arbitrary) interpretation of the latent factor.

So, while we could replace eigh() with eig() to force the results to match what SPSS and R do, I am not convinced that we need to do that since this is not really a bug.

@jbiggsets any thoughts?

desilinguist avatar Sep 03 '22 17:09 desilinguist

Yes, I would be inclined not to change this, since it doesn't really strike me as a bug and we use eigh pretty consistently throughout. Maybe we can mention it in the documentation?

jbiggsets avatar Sep 03 '22 18:09 jbiggsets

Adding to the documentation sounds like a good idea. I'll do that!

desilinguist avatar Sep 03 '22 18:09 desilinguist

Thanks a lot!

celip38 avatar Sep 05 '22 07:09 celip38