velocyto-notebooks icon indicating copy to clipboard operation
velocyto-notebooks copied to clipboard

Typos and running jupyter notebooks

Open sejiro opened this issue 6 years ago • 19 comments

Hello, looking at the jupyter notebook code I had errors with both notebooks when calling the following line:

vlm.set_clusters(vlm.ca["ClusterName"], cluter_colors_dict=colors_dict) changing "cluter_colors_dict" to "cluster_colors_dict" solved this issue.

However, running through the rest of the jupyter notebooks I am unable to proceed after calling: vlm.perform_PCA()

The problem seems to be that the kernel will go dead and attempt to restart I am not really sure what I haven't set up correctly but help would be appreciated

screen shot 2018-01-23 at 6 48 38 pm

sejiro avatar Jan 23 '18 23:01 sejiro

This is very weird, I tried to understand the problem but I don't have enough information. Do you think you could give me some extra info. For example what is the shape of the attribute S_norm before that call, and if there are any Nans.

gioelelm avatar Jan 24 '18 22:01 gioelelm

I am running the notebook through Anaconda-Navigator's web based Jupyter Notebook application. I figured since it was a jupyter notebook it would play nicely and I should be able to walk through the example code. Perhaps I just need to be running more locally through command line and/or another IDE...this will also make investigating values and attribute easier too.

At any rate below is the printed matrix and shape: screen shot 2018-01-24 at 7 25 26 pm

Does this help? How can I better assess the Nans/what other information can I provide?

sejiro avatar Jan 25 '18 00:01 sejiro

Everything looks allright. Are you running on a powerful enough machine? Could this be a memory issue? Could you monitor that while running? Can you check which version of velocyto and loompy you are using?

Really I don't understand where the problem might be... perform_PCA is only calling PCA from scikit-learn.

It is basically doing just the following:

self.pca = PCA()
self.pcs = self.pca.fit_transform(self.S_norm.T)

That's why the fact it crashes is so puzzling to me.

gioelelm avatar Jan 25 '18 00:01 gioelelm

I have 24 GB RAM on this MAC would you recommend more? velocyto is ver 0.13.1 loompy 1.10

sejiro avatar Jan 25 '18 00:01 sejiro

And running the code I wrote above crashes the notebook as well?

gioelelm avatar Jan 25 '18 01:01 gioelelm

I am out of suggestions, try to run the code of the notebook in a script instead. Let's hope it throws an error instead of chrashing... so I can understand where the problem is.

gioelelm avatar Jan 25 '18 01:01 gioelelm

I'm testing the above code rn. I am also thinking it is just something weird with the jupyter web app method of running this.

sejiro avatar Jan 25 '18 01:01 sejiro

says PCA not defined i guess scikit-learn isn't imported right then? I've got the latest release: 0.19.1

sejiro avatar Jan 25 '18 01:01 sejiro

I didn't mean you had run the code literally as is. I assumed you would have added the required from sklearn.decomposition import PCA. Sorry for not being more clear

gioelelm avatar Jan 25 '18 01:01 gioelelm

That's my bad, I've not programmed for a little bit. I tried that and same kernel crash error. I am going to try the code locally tomorrow through a script and not the online NotebookApp. Thanks for your help.

sejiro avatar Jan 25 '18 01:01 sejiro

Ok so running in Ipy console actually gives useful error messages. I am going to check the other issues for any similar problems but see the following:

screen shot 2018-01-25 at 11 01 36 am

sejiro avatar Jan 25 '18 16:01 sejiro

ok now try with my code above again but give as input X instead of self.S_norm.T.shape. Where X is:

X = np.random.normal(size=self.S_norm.T.shape)

gioelelm avatar Jan 25 '18 16:01 gioelelm

Could it be there are nans? Is this the correct call? screen shot 2018-01-25 at 11 18 34 am

sejiro avatar Jan 25 '18 16:01 sejiro

Could it be there are nans?

No

Is this the correct call?

Yes

I think this last test proved that the problem is your python installation. The reason is that there is basically not a single line of my code running in the above code. and what is failing is the call to sklearn PCA.

Something is broken in your isntallation. Please reinstall a conda environment from scratch starting from miniconda and then follow the installation guide in the docs.

gioelelm avatar Jan 25 '18 16:01 gioelelm

Alright, I'll try reinstalling. Thanks

sejiro avatar Jan 25 '18 16:01 sejiro

I reinstalled and followed installation instructions in the docs. Running the code in the dentate gyrus notebook and the adjusted call we discussed leads to a segmentation fault: 11 error from Ipython and from the command line as a script. So reinstallation of python doesn't seem to fix the problem.

I can try to run this through our computing cluster and therefore run on a more powerful and perhaps stabler architecture? Do you have any other ideas for solving this issue?

sejiro avatar Jan 28 '18 20:01 sejiro

I don't have other suggestion right now. I can only promise that, in the next weeks, I will test again the notebook and the installation from a couple of different environment in the attempt to replicate your problem, but as soon as this is an isolated problem that I cannot trace back to a putative cause, I cannot put it as my number 1 priority.

gioelelm avatar Jan 28 '18 21:01 gioelelm

Running the script on the cluster returns no segmentation fault error 11. This confirms that the issue is something related to installation/architecture. While not truly resolved, you may consider this issue closed.

sejiro avatar Jan 30 '18 21:01 sejiro

Hello, I have been having the same issue of "Segmentation faults:11" while running perform_PCA(). I have used velocyto before on my laptop (MacBook Pro,2018, 16GB mem) without this error occurring. However, after having to update and reinstall certain packages, in addition to the Mohajve update, I have begun to get this error. I can also reproduce this using fit_transform(X) from the sklearn library for my dataset as well as random data.

Reading around, these issues seem to be similar to this post: https://github.com/scikit-learn/scikit-learn/issues/8236

In brief, the error was suggested to come from incompatibilities between Scipy and XGBoost: https://github.com/scikit-learn/scikit-learn/issues/8236#issuecomment-395141179

The alternate they have suggested is to use the numpy implementation of svd. I can reproduce the error on my laptop using the scipy implementation of svd scipy.linalg.svd(X), which I am guessing is also used in velocyto. I am able to resolve this using the numpy implementation np.linalg.svd(X). Hence, the segfaults might not be due to a memory issue rather an incompatibility issue.

Is there a possible way to incorporate the numpy implementation in veolcyto?

anirudhpatir avatar Jul 27 '19 21:07 anirudhpatir