scanpy
scanpy copied to clipboard
STARsolo Matrix with Velocyto --> Anndata function
The scanpy.read_10X_mtx works well for reading in the STARsolo output matrices, which are based on the CellRanger Outputs.
However, it would be nice to have a function or modification of the read_10X_mtx function (e.g. a boolean for STARsolo velocyto) to automate inputting the velocyto matrices that STARsolo outputs and placing them in the appropriate layers. A boolean switch for filtered versus raw matrices would be a good addition as well.
Example code: https://github.com/alexdobin/STAR/issues/774#issuecomment-850477636
Thanks for the suggestion.
@WeilerP, do you think this would be more appropriate in scvelo
? (Side note, I have thought that tutorial
of going from BAMs through scvelo
would be quite useful)
@WeilerP, do you think this would be more appropriate in
scvelo
? (Side note, I have thought that tutorial of going from BAMs throughscvelo
would be quite useful)
Hm, not sure if the functionality would match the expectation. In scvelo
, we'd store only unspliced and spliced counts (spliced both in adata.X
and adata.layers
). Based on the proposed code snippet in alexdobin/STAR#774 (comment), the expected output would be to read all of the available information?
It doesn't look to me like it's reading in much other than those. Just the obs
and var
that you'd get from any cell ranger experiment and an ambiguous
layer.
What additional things are you concerned about?
Yes, agreed, ambiguous
isn't really a problem (could be read in optionally via a keyword argument). I was more concerned about adata.X
being the original count matrix - unless I am misunderstanding the code snippet ...
I was more concerned about adata.X
Ah, yeah I see what you mean. But isn't that fine for scvelo
? I guess it wasn't obvious to me from the documentation that X
wasn't supposed to be "total" counts.
Also it looks like the output formats may not be documented or finalized. It might be worth reaching out to the devs to see what's up before anything is implemented.
Just from an end user perspective, I hadn't recognized spliced is put into adata.X and it seems that the other STARsolo users in the original thread didn't either. For my edification, would it cause any problems for me to modify the code to put spiced into adata.X or am I missing something else?
@ivirshup, @JBreunig using the absolute counts isn't a problem per se. It's simply that the scvelo paper used the spliced counts in adata.X
based on which the highly variable genes are selected and PCA, neighbor graph and UMAP embedding are calculated.
@JBreunig, shouldn't be a problem to put spliced into adata.X
as the dimensions of spliced and total counts are the same.