rm_corr error LinAlgError: SVD did not converge
Hello,
I am trying to run rm_corr on multiple columns of a dataframe (gene expression data), and while the function works well on many column pairs (and match the expected output from the R rmcorr function) one pair of columns throws an error LinAlgError: SVD did not converge. However, this pair of columns has no trouble running in the R rmcorr implementation. Therefore, I am curious what the difference between the two implementations is, and whether it is possible to get this to converge in python? I would prefer to continue to use your implementation as it seems to be much faster (I am in the midst of benchmarking how both implementations scale, so as a side note if you have any data on that I would very much appreciate it!).
I am using Pingouin v.0.5.5. I have attached a minimal dataframe to recreate my error, along the with the code below:
# load dataframe as dataframe
import pingouin as pg
pg.rm_corr(data = dataframe, x = "Gene1", y = "Gene2", subject = "Subject")
Dataframe: df_pingouin_fail.csv
Thanks so much for your help! Best, Sophie
Hi Sophie,
Thanks for opening the issue. The Pingouin implementation is based on an ANCOVA model that is implemented with statsmodels.
I am not able to reproduce the error on my machine:
I am using Python 3.9 with statsmodels 0.14. What versions of Python, pingouin, and statsmodels are you using?
Second, I noticed that your datasets includes many subjects with only 1 or 2 observations. Do you still get the error if you remove these participants from the dataset?
Thanks, Raphael
Hi Raphael,
Thanks so much for your reply. I am using the following versions (on Linux):
statsmodels 0.14.0 python 3.9.0 pingouin 0.5.5
Which version of pingouin are you using when you don't reproduce the error?
Thanks so much! Sophie
Hi,
I am using pingouin 0.5.5 (on Mac), pandas 2.2.2, numpy 1.26.4, statsmodels 0.14.0
Hi Raphael,
I was wondering if the LAPACK dependency of numpy could cause the issue. Would you mind running numpy.show_config() and sharing the results?
Many thanks,
Eric
Sure thing @Eric-Kobayashi:
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /usr/local/include
lib directory: /usr/local/lib
name: openblas64
openblas configuration: USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3
pc file directory: /usr/local/lib/pkgconfig
version: 0.3.23.dev
lapack:
detection method: internal
found: true
include directory: unknown
lib directory: unknown
name: dep4548835888
openblas configuration: unknown
pc file directory: unknown
version: 1.26.4
Compilers:
c:
args: -fno-strict-aliasing
commands: clang
linker: ld64
linker args: -fno-strict-aliasing
name: clang
version: 14.0.0
c++:
commands: clang++
linker: ld64
name: clang
version: 14.0.0
cython:
commands: cython
linker: cython
name: cython
version: 3.0.8
Machine Information:
build:
cpu: x86_64
endian: little
family: x86_64
system: darwin
host:
cpu: x86_64
endian: little
family: x86_64
system: darwin
Python Information:
path: /private/var/folders/kx/gw6dssyn19d9qjs9mvh4hkz80000gn/T/cibw-run-5dj358b1/cp39-macosx_x86_64/build/venv/bin/python
version: '3.9'
SIMD Extensions:
baseline:
- SSE
- SSE2
- SSE3
found:
- SSSE3
- SSE41
- POPCNT
- SSE42
- AVX
- F16C
- FMA3
- AVX2
not found:
- AVX512F
- AVX512CD
- AVX512_KNL
- AVX512_SKX
- AVX512_CLX
- AVX512_CNL
- AVX512_ICL
Hi Raphael,
Thanks for providing these information. It turns out not to be a version issue but it might be a deeper problem with the numpy.linalg.pinv function.
I was able to successfully run the rm_corr function after shuffling the dataframe. I've simulated reshuffling many times and found there is around 8.12% of failure to converge. On the other hand, would you mind testing the same and see if you replicate the issue?
pg.rm_corr(data = dataframe.sample(len(dataframe)), x = "Gene1", y = "Gene2", subject = "Subject")
Hmm, very strange behavior indeed. I can replicate the error: 5 out of 100 run of the function on resampled data failed (5% failure).