scanpy STARsolo Matrix with Velocyto --> Anndata function

The scanpy.read_10X_mtx works well for reading in the STARsolo output matrices, which are based on the CellRanger Outputs.

However, it would be nice to have a function or modification of the read_10X_mtx function (e.g. a boolean for STARsolo velocyto) to automate inputting the velocyto matrices that STARsolo outputs and placing them in the appropriate layers. A boolean switch for filtered versus raw matrices would be a good addition as well.

May 27 '21 19:05 JBreunig

Example code: https://github.com/alexdobin/STAR/issues/774#issuecomment-850477636

May 28 '21 15:05 JBreunig

Thanks for the suggestion.

@WeilerP, do you think this would be more appropriate in scvelo? (Side note, I have thought that tutorial of going from BAMs through scvelo would be quite useful)

Jul 05 '21 09:07 ivirshup

@WeilerP, do you think this would be more appropriate in scvelo? (Side note, I have thought that tutorial of going from BAMs through scvelo would be quite useful)

Hm, not sure if the functionality would match the expectation. In scvelo, we'd store only unspliced and spliced counts (spliced both in adata.X and adata.layers). Based on the proposed code snippet in alexdobin/STAR#774 (comment), the expected output would be to read all of the available information?

Jul 05 '21 10:07 WeilerP

It doesn't look to me like it's reading in much other than those. Just the obs and var that you'd get from any cell ranger experiment and an ambiguous layer.

What additional things are you concerned about?

Jul 05 '21 10:07 ivirshup

Yes, agreed, ambiguous isn't really a problem (could be read in optionally via a keyword argument). I was more concerned about adata.X being the original count matrix - unless I am misunderstanding the code snippet ...

Jul 05 '21 11:07 WeilerP

I was more concerned about adata.X

Ah, yeah I see what you mean. But isn't that fine for scvelo? I guess it wasn't obvious to me from the documentation that X wasn't supposed to be "total" counts.

Also it looks like the output formats may not be documented or finalized. It might be worth reaching out to the devs to see what's up before anything is implemented.

Jul 06 '21 03:07 ivirshup

Just from an end user perspective, I hadn't recognized spliced is put into adata.X and it seems that the other STARsolo users in the original thread didn't either. For my edification, would it cause any problems for me to modify the code to put spiced into adata.X or am I missing something else?

Jul 06 '21 15:07 JBreunig

@ivirshup, @JBreunig using the absolute counts isn't a problem per se. It's simply that the scvelo paper used the spliced counts in adata.X based on which the highly variable genes are selected and PCA, neighbor graph and UMAP embedding are calculated. @JBreunig, shouldn't be a problem to put spliced into adata.X as the dimensions of spliced and total counts are the same.

Jul 12 '21 13:07 WeilerP

scanpy scanpy copied to clipboard

STARsolo Matrix with Velocyto --> Anndata function

scanpy
scanpy copied to clipboard