High resolution analysis ValueError
Heya,
When I try to run cryodrgn analyze on my higher resolution dataset (256) I am getting the below error.
2024-10-02 12:10:55 Perfoming principal component analysis...
Traceback (most recent call last):
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/bin/cryodrgn", line 33, in
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/cryodrgn-0.3.3-py3.7.egg/cryodrgn/analysis.py", line 34, in run_pca
pca.fit(z)
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 382, in fit
self._fit(X)
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 431, in _fit
X, dtype=[np.float64, np.float32], ensure_2d=True, copy=self.copy
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/base.py", line 561, in _validate_data
X = check_array(X, **check_params)
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 792, in check_array
_assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 116, in _assert_all_finite
type_err, msg_dtype if msg_dtype is not None else X.dtype
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Command used is cryodrgn analyze 02_256_8D_1024 49 --Apix 1.08
I have over 500,000 particles so I'm wondering if that might be the issue? Or whether this might be related to the Assertion error people have been mentioning. Any suggestions on the cause and fix would be appreciated.
Thanks Alana
Hi Alana, can you double-check the version of cryoDRGN you have installed (using the command cryodrgn --version) and also the version of Python you are using? It looks like you may have some older versions for each (v0.3.3 and v3.7).
Another thing to double-check is if there are indeed degenerate values in your latent space matrix, which you can look at using something like the following:
import numpy as np
from cryodrgn.utils impoirt load_pkl
z = load_pkl("02_256_8D_1024/z.49.pkl")
np.isnan(z).sum()
Generally the number of particles shouldn't be a problem in and of itself if the reconstruction already ran to completion, but the model might have had trouble coming up with a coherent representation of the heterogeneity landscape characterizing your input, leading to missing/null values in the model output.
Best, Michal
Hi Michal,
Thank you for the information. We updated CryoDRGN to the latest version and the error has now changed a little.
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/bin/cryodrgn", line 8, in
I'm not sure if this is something I can fix? Would it be worth re-running the training with the more updated version of cryoDRGN?
Sorry to hear you are still having issues — you can try the latest version of cryoDRGN, just released today, to see if this resolves the issue, and also double-check the outputs the model is producing as I suggested above!
Hi Alana, it looks like there are probably some NaNs in the latent embeddings z.pkl. You can run analyze on an earlier epoch. We have recently updated the code to stop training as soon as there are NaNs. We've noticed that these training instabilities may crop up if there are a lot of junk particles in the dataset.