factor_analyzer Loadings Different (negative) vs SPSS/R Psych Lib results

Based on a correlation matrix, the calculated results with factor_analyzer are different than when running in SPSS, as in they seem to be multiplied by (-1). Communalities are more or less equal.

Example Matrix below (12x12):

1.00 | 0.53 | 0.26 | 0.14 | 0.18 | 0.24 | 0.24 | 0.22 | 0.20 | 0.21 | 0.21 | 0.36 0.53 | 1.00 | 0.33 | 0.34 | 0.39 | 0.51 | 0.50 | 0.42 | 0.27 | 0.43 | 0.35 | 0.52 0.26 | 0.33 | 1.00 | 0.22 | 0.28 | 0.24 | 0.27 | 0.28 | 0.09 | 0.16 | 0.03 | 0.18 0.14 | 0.34 | 0.22 | 1.00 | 0.56 | 0.47 | 0.49 | 0.34 | 0.28 | 0.37 | 0.27 | 0.29 0.18 | 0.39 | 0.28 | 0.56 | 1.00 | 0.55 | 0.59 | 0.49 | 0.25 | 0.43 | 0.30 | 0.40 0.24 | 0.51 | 0.24 | 0.47 | 0.55 | 1.00 | 0.80 | 0.55 | 0.30 | 0.51 | 0.49 | 0.55 0.24 | 0.50 | 0.27 | 0.49 | 0.59 | 0.80 | 1.00 | 0.56 | 0.31 | 0.58 | 0.50 | 0.56 0.22 | 0.42 | 0.28 | 0.34 | 0.49 | 0.55 | 0.56 | 1.00 | 0.27 | 0.37 | 0.32 | 0.42 0.20 | 0.27 | 0.09 | 0.28 | 0.25 | 0.30 | 0.31 | 0.27 | 1.00 | 0.55 | 0.28 | 0.29 0.21 | 0.43 | 0.16 | 0.37 | 0.43 | 0.51 | 0.58 | 0.37 | 0.55 | 1.00 | 0.52 | 0.51 0.21 | 0.35 | 0.03 | 0.27 | 0.30 | 0.49 | 0.50 | 0.32 | 0.28 | 0.52 | 1.00 | 0.55 0.36 | 0.52 | 0.18 | 0.29 | 0.40 | 0.55 | 0.56 | 0.42 | 0.29 | 0.51 | 0.55 | 1.00

Code:

fa = FactorAnalyzer(method='minres', n_factors=1, rotation=None, is_corr_matrix=True, bounds=(0.005, 1)) fa.fit(fa_df) print(fa.loadings_)

Result: [[-0.3883726 ] [-0.66186571] [-0.32924641] [-0.55939523] [-0.66587561] [-0.81117504] [-0.8457027 ] [-0.6317671 ] [-0.44712954] [-0.69460134] [-0.58560195] [-0.69916123]]

SPSS Code produces almost the same result except each value *(-1) (i.e. positive value/absolute value), while get_communalities() returns mostly the same values (mostly because SPSS rounds values on display).

Any idea on what am I missing or what is the issue?

Thanks

Aug 23 '21 14:08 Db-pckr

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

Sep 01 '21 11:09 Db-pckr

Thanks for the follow up! I will look into this very soon.

Sep 01 '21 14:09 jbiggsets

Update: Tested vs R Psych library, same issue. Realized that the problem seems to occurr when the correlation matrix is all positive. If even 1 value from it is negative, the results match. Maybe this helps debug...

I can't seem to reproduce this issue. For example, using the data provided above (data), here are my results:

import pandas as pd
from factor_analyzer import FactorAnalyzer

df = pd.DataFrame([[float(val) for val in row.split(' | ')]
                   for row in data.strip().split('\n')])

fa = FactorAnalyzer(method='minres',
                    n_factors=1,
                    rotation=None,
                    bounds=(0.005, 1),
                    is_corr_matrix=True).fit(df)
print(fa.loadings_)

[[0.3879858 ]
 [0.66334567]
 [0.32897377]
 [0.55966426]
 [0.66396016]
 [0.81430826]
 [0.8469053 ]
 [0.63367546]
 [0.44783303]
 [0.69420312]
 [0.58345214]
 [0.6963522 ]]

This matches R's psych library. Let me know if I'm missing something!

Oct 07 '21 02:10 jbiggsets

I'm using: pandas 1.2.4 numpy 1.20.2 python 3.8.10

If you're getting correct results I would guess it's because of an older numpy version to be honest, and how it is used internally in factor_analyzer. Thanks!

Oct 07 '21 08:10 Db-pckr

I encounter the same issue (negative factor loadings):

I'm using: pandas 1.4.3 numpy 1.21.5 python 3.9.12

Which version of packages do you suggest for avoiding this please?

Thanks!

Aug 31 '22 13:08 celip38

@celip38 please share your data, if possible, so we can try to reproduce the issue.

Sep 01 '22 17:09 desilinguist

@desilinguist You can use the data I presented above to test this problem.

Sep 02 '22 08:09 Db-pckr

Thanks @Db-pckr. I can replicate this on my end too with the latest numpy library.

I poked around a bit and found that numpy.linalg.eigh() used for the eigenvalue decomposition was returning an all-negative first eigenvector for this correlation matrix whereas if the more general – but less efficient – numpy.linalg.eig() returns an all-positive first eigenvector, viz.

With eigh():

array([[-0.17816009],
       [-0.30460323],
       [-0.15106223],
       [-0.25699352],
       [-0.3048854 ],
       [-0.37392409],
       [-0.3888924 ],
       [-0.2909789 ],
       [-0.20564148],
       [-0.31877273],
       [-0.26791673],
       [-0.31975958]])

and, with eig():

array([[0.17816009],
       [0.30460323],
       [0.15106223],
       [0.25699352],
       [0.3048854 ],
       [0.37392409],
       [0.3888924 ],
       [0.2909789 ],
       [0.20564148],
       [0.31877273],
       [0.26791673],
       [0.31975958]])```

However, neither is incorrect because, as we know, if $v$ is an eigenvector, then so is $\alpha*v$, where $\alpha$ is any scalar ( $\neq 0$ ). It also follows that signs on factor loadings are also kind of meaningless because all they do is flip the (already arbitrary) interpretation of the latent factor.

So, while we could replace eigh() with eig() to force the results to match what SPSS and R do, I am not convinced that we need to do that since this is not really a bug.

@jbiggsets any thoughts?

Sep 03 '22 17:09 desilinguist

Yes, I would be inclined not to change this, since it doesn't really strike me as a bug and we use eigh pretty consistently throughout. Maybe we can mention it in the documentation?

Sep 03 '22 18:09 jbiggsets

Adding to the documentation sounds like a good idea. I'll do that!

Sep 03 '22 18:09 desilinguist

Thanks a lot!

Sep 05 '22 07:09 celip38