dPCA icon indicating copy to clipboard operation
dPCA copied to clipboard

ValueError when filling unbalanced trialX parameter with NaNs as indicated.

Open pietromarchesi opened this issue 8 years ago • 5 comments

My trialX is slightly unbalanced, so I followed the instructions If different combinations of features have different number of trials, then set n_samples to the maximum number of trials and fill unoccupied dat points with NaN., but this results in a ValueError: array must not contain infs or NaNs.

Full traceback: ` File "/home/pietro/pythonprojects/starecase/DemixedPCA/my_dPCA.py", line 113, in significance_masks = dpca.significance_analysis(trial_average_data,single_trial_data,axis='t',n_shuffles=10,n_splits=10,n_consecutive=10)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 864, in significance_analysis true_score = compute_mean_score(X,trialX,n_splits)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 821, in compute_mean_score trainZ = self.fit_transform(trainX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 168, in fit_transform self._fit(X,trialX=trialX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 570, in _fit self.P, self.D = self._randomized_dpca(regX,regmXs,pinvX=pregX)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 472, in _randomized_dpca U,s,V = randomized_svd(np.dot(C,rX),n_components=self.n_components,n_iter=self.n_iter,random_state=np.random.randint(10e5))

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 364, in randomized_svd power_iteration_normalizer, random_state)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 266, in randomized_range_finder Q, _ = linalg.qr(safe_sparse_dot(A, Q), mode='economic')

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/scipy/linalg/decomp_qr.py", line 126, in qr a1 = numpy.asarray_chkfinite(a)

File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1215, in asarray_chkfinite "array must not contain infs or NaNs")

ValueError: array must not contain infs or NaNs`

pietromarchesi avatar Sep 05 '17 09:09 pietromarchesi

Could you post a minimum working example?

wielandbrendel avatar Sep 05 '17 09:09 wielandbrendel

I wrote a quick gist which gives that error, even though there is more than one trial for each stimulus-decision combination and the rest of the data is filled with NaNs as indicated (I hope that is what you meant by 'minimum working example').

pietromarchesi avatar Sep 05 '17 10:09 pietromarchesi

I dived into this problem but the problem is not straight-forward to fix (the whole cross-validation procedure is affected). I am pretty short on time right now so I won't be able to provide a comprehensive solution within the next three weeks. For a quick fix I'd suggest to you use the sklearn Impute class to replace the missing values with feature mean or median.

wielandbrendel avatar Sep 05 '17 16:09 wielandbrendel

Thanks a lot for taking a look! Unfortunately I don't think the Impute class handles tensors, so I did it manually in this gist. When you say 'replace the missing values with feature mean or median', do you mean the mean or median taken across all trials, stimuli, and decisions, or the global mean, taken also across time points?

Furthermore, I assume this problem is not present in the Matlab version of the toolbox, which was used for the paper, is that right?

pietromarchesi avatar Sep 07 '17 11:09 pietromarchesi

Matlab version should be fine.

dkobak avatar Sep 07 '17 11:09 dkobak