dynamo-release icon indicating copy to clipboard operation
dynamo-release copied to clipboard

Correct spliced RNA velocity

Open lh12565 opened this issue 2 years ago • 13 comments

Hi, my data was obtained the conventional scRNA-seq technique (10X) with no labeling data. I found the nonsensical velocity flow from macrophage to monocyte using dynamo. I found your article called "Mapping transcriptomic vector fields of single cells" using "correct spliced RNA velocity" to correct the RNA velocity (via distribution of velocity confidence? Fig S3D-G). I hope you can provide the code of "correct spliced RNA velocity". Although he will affect downstream analysis, at least correct correction on my data. In addition, Can your vector field method correct the direction of RNA velocity without filtering genes? Thanks!

lh12565 avatar Mar 07 '22 02:03 lh12565

dear @lh12565 , thanks for your dynamo! regarding correcting spliced RNA velocity, please refer to #257 #218.

regarding the second question, the vector field method itself cannot correct velocity if the overall flow is problem. However, it can identify outliers of cells that have velocity go against most cells and correct for those (see figure S5).

Please let me know if you have any additional questions.

Thanks!

Xiaojieqiu avatar Mar 07 '22 15:03 Xiaojieqiu

Hi @Xiaojieqiu , when I used dyn.tl.gene_wise_confidence, I found only 2000 genes had been evaluated. But my anndata still have 30000~ genes, I guess it's already filtered at dyn.pp.recipe_monocle step to calculate RNA velocity. I don't know where the filtered matrix or genes is stored, because this will affect where I put the filtered high confidence genes by dyn.tl.gene_wise_confidence and run dyn.tl.dynamics again. I guess the storage location is 'use_for_dynamics' in adata.var. I wonder if I can recalculate the RNA velocity with the following code:

index1=adata.uns['gene_wise_confidence'].prog_confidence>0.8
index2=adata.uns['gene_wise_confidence'].mature_confidence>0.8
genes=adata.uns['gene_wise_confidence'][(index1|index2)].gene.values  #519 genes
tmp=[]
for i in adata.var_names:
    if i in genes:
        tmp.append(True)
    else:
        tmp.append(False)

tmp=np.array(tmp)

adata.var['use_for_dynamics_orig']=adata.var['use_for_dynamics'].copy()
adata.var['use_for_dynamics']=tmp

dyn.tl.dynamics(adata, model='stochastic', cores=3)

However, there is an error as below:

|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/NAS/lh/software/miniconda3/envs/dynamonew/lib/python3.8/site-packages/dynamo/tools/dynamics.py", line 304, in dynamics
    tkey = adata.uns["pp"]["tkey"]
KeyError: 'tkey'

lh12565 avatar Mar 08 '22 05:03 lh12565

Hi @lh12565 you are using cscRNA-seq datasets (conventional single cell RNA-seq), right? Then you may need to make sue you run the dyn.pp.recipe_monocle(adata) first.

Also you don't need to run tl.dynamics again at all because it has already calculated. You can use the results from dyn.tl.gene_wise_confidence for downstream analyses. You can run tl.cell_wise_velocity(adata, transition_genes = adata.var.index[adata.var['use_for_dynamics']], enforce=True) to rerun velocity projection with those genes for downstream analyses.

Xiaojieqiu avatar Mar 08 '22 17:03 Xiaojieqiu

Hi, @Xiaojieqiu yes, I use cscRNA-seq datasets. I have run dyn.pp.recipe_monocle(adata) before run dyn.tl.gene_wise_confidence. I didn't find cell_wise_velocity, did you mean tl.cell_velocities? When I run code as below, it seems to be good:

index1=adata.uns['gene_wise_confidence'].prog_confidence>0.8
index2=adata.uns['gene_wise_confidence'].mature_confidence>0.8
genes=adata.uns['gene_wise_confidence'][(index1|index2)].gene.values  #519 genes
dyn.tl.cell_velocities(adata, method='pearson', other_kernels_dict={'transform': 'sqrt'},transition_genes = genes, enforce=True)
dyn.pl.streamline_plot(adata, color=['celltype'], basis='umap', show_legend='on data', show_arrowed_spines=True)

But I still have some questions for you: (1) The gene confidence in adata.uns['gene_wise_confidence'] has two different columns "prog_confidence" and "mature_confidence". Did I choose for their intersection or union, or one of them, or other methods to filter genes? How is this threshold determined (e.g. 0.8) ? See my density diagram below (prog_confidence and mature_confidence): gene_confidence_pro_0 8 gene_confidence_mature_0 8

(2) If I don't know the direction of the differentiation, for example, do not know who are progenitor cells and who are terminally cells. How do I make sure its credibility of velocity flow, and how to calculate gene confidence? (3) If I run the following code, does the downstream analysis (such as dyn.vf.acceleration(adata, basis='pca') , dyn.pl.topography ...) always use these filtered 519 genes to analyze?

dyn.tl.cell_velocities(adata, basis='pca',transition_genes = genes, enforce=True);
dyn.vf.VectorField(adata,
                   basis='pca',
                   M=100)

Thanks!

lh12565 avatar Mar 09 '22 02:03 lh12565

Hi @lh12565 thanks for your questions:

(1) The gene confidence in adata.uns['gene_wise_confidence'] has two different columns "prog_confidence" and "mature_confidence". Did I choose for their intersection or union, or one of them, or other methods to filter genes? How is this threshold determined (e.g. 0.8) ? See my density diagram below (prog_confidence and mature_confidence):

You may use the intersection of both to ensure genes behave correctly near both the progenitor and terminal cell states on the phase plot. But since you already find your result looks good, so it may be fine already with your union choice. The 0.8 is arbitrary and you can choose any threshold that is reasonable. small values will give you more genes but may less ideal velocity flow that matches the prior knowledge and vice versa.

You may read our supplementary figure S3 and the corresponding method section to figure out the intuition behind this correction method.

  1. If I don't know the direction of the differentiation, for example, do not know who are progenitor cells and who are terminally cells. How do I make sure its credibility of velocity flow, and how to calculate gene confidence?

As I mentioned above, this approach is designed for the purpose of correcting your RNA velocity flow once you find your velocity flow is problematic. It is really not something you should use often. The problematic velocity flow results from the inproper assumptions made in the RNA velocity estimation, see our discussion in figure 3 and the corresponding method sections. In fact, you may find the genes with low confidence interestingly becaue those are the genes that go against the RNA velocity assumptions (genes with non-constant transcription rate, etc.). and you may check these genes for biological insight discovery.

If you don't know the progenitor and terminal cell states by design, ideally the plain RNA velocity will correctly reveal the direction. You may also look for stemness genes and marker genes to figure out the direction.

I would suggest you or your lab to consider using the metabolic labeling enabled single cell RNA-seq, as we reported in the paper. this type of data will most of the time give you correct velocity flow with dynamo's model framework.

(3) If I run the following code, does the downstream analysis (such as dyn.vf.acceleration(adata, basis='pca') , dyn.pl.topography ...) always use these filtered 519 genes to analyze?

Yes, if you use these genes as the transition genes, it will always use those genes for downstream RNA velocity vector field analyses.

Let me know if you have any other additional questions

Xiaojieqiu avatar Mar 16 '22 17:03 Xiaojieqiu

Thanks a lot. I have one more question. In my research, some studies said one cancer subtype (sub1) can switch to another (sub2). And some studies said the sub2 can be reversed to complement the sub1. The velocity flow directions of a few samples point from sub1 to sub2, most are the opposite. My research mainly focuses on the former (sub1->sub2) and find out a drive gene. So that's why I reset the progenitor cell and the terminal cell using gene_wise_confidence. As you said, I can use the metabolic labeling enabled single cell RNA-seq (tscRNA), and if the result is sub2->sub1, can I still find the driver gene (sub1->sub2)? Thanks!

lh12565 avatar Mar 17 '22 04:03 lh12565

if you find the transition is sub2->sub1 from the labeling data, then it means the transition from sub1->sub2 maybe is rare.

However, you can always use the least action path method to predict the top genes. also see figure 6 and the corresponding info from our cell paper

btw I edit your question to avoid copying previous content which was annoying

Xiaojieqiu avatar Mar 17 '22 04:03 Xiaojieqiu

if you find the transition is sub2->sub1 from the labeling data, then it means the transition from sub1->sub2 maybe is rare.

However, you can always use the least action path method to predict the top genes. also see figure 6 and the corresponding info from our cell paper

btw I edit your question to avoid copying previous content which was annoying

Hi, @Xiaojieqiu the number of cells in sub2 is indeed much lower than the number of cells in sub1. Is this method (least action path method) only applicable to tscRNA? I only have the cscRNA data now. If it can be used for cscRNA, should I use the pre-filtered data or the filtered data (filtered by gene_wise_confidence), because their attractors are mostly the opposite. Thanks.

lh12565 avatar Mar 17 '22 06:03 lh12565

all methods, including the LAP method, in dynamo work for both cscRNA-seq and tscRNA-seq. it is just that the later gives you more accurate RNA velocity flow and absolute velocity. the velocity part is upstream of LAP, you should check it based on your understand of the system, the LAP can be used to identify genes that are important for conversion between any cell state

Xiaojieqiu avatar Mar 17 '22 13:03 Xiaojieqiu

Is the meaning of attractors of lap_tutorial same as the meaning of attractors of Zebrafish pigmentation's tutorial? Both represent terminal cells? If yes, HSC should have no attractor, (HSC seems to emitting points?), but you seems to choose [8.45201833, 9.37697661] as the attractor. If no, can cells be randomly selected in the HSC? I'm also a little confused about fixed points, unstable points and stable points. In #306, you said 1 of ftype represent unstable points which shows in blue (said in Zebrafish pigmentation's tutorial), and in lap_tutorial, I found ftype of [8.45201833, 9.37697661] is 1, but shows in red. Is the rank genes is not limited to TF in lap_tutorial? Thanks!

lh12565 avatar Mar 18 '22 03:03 lh12565

the cell type to color map is this: c=("black" if cur_ftype == -1 else "blue" if cur_ftype == 0 else "red"), see this line of code https://github.com/aristoteleo/dynamo-release/blob/de6c6e737fb10177ff97fbb8179a1b0139c836d2/dynamo/plot/topography.py#L390

so type is -1 is an stable fixed point (attractor) and will be black, if it is 0, it will be blue, and 1 (and others) will be red.

I think we need to clarify that only if the type is -1, it corresponds to an attractor (stable fixed point). Otherwise they are unstable fixed points. So you are right, HSC is an unstable fixed point but not an attractor.

Let me know if this is clear to you.

@dummyindex can you please make sure the Lap tutorial and zebrafish tutorial consistent?

Xiaojieqiu avatar Mar 18 '22 05:03 Xiaojieqiu

the cell type to color map is this: c=("black" if cur_ftype == -1 else "blue" if cur_ftype == 0 else "red"), see this line of code

https://github.com/aristoteleo/dynamo-release/blob/de6c6e737fb10177ff97fbb8179a1b0139c836d2/dynamo/plot/topography.py#L390

so type is -1 is an stable fixed point (attractor) and will be black, if it is 0, it will be blue, and 1 (and others) will be red.

I think we need to clarify that only if the type is -1, it corresponds to an attractor (stable fixed point). Otherwise they are unstable fixed points. So you are right, HSC is an unstable fixed point but not an attractor.

Let me know if this is clear to you.

@dummyindex can you please make sure the Lap tutorial and zebrafish tutorial consistent?

So I think their corresponding relationship is as follows:

red:	unstable fixed point	emitting points	ftype: 1
blue:	unstable fixed point	saddle points	ftype: 0
black:	stable fixed point	attractors	ftype: -1

right? Since HSC has no attractor, can I randomly select points or need to select closest cells of unstable points as initial cells? You said I can use the least action path method to predict the top genes before. How do I get these top genes? Is it stored in transition_graph["sub1->sub2"]['ranking']? After I filter the genes and only keep TFs, I find the rank of transition_graph["sub1->sub2"]['ranking'] is still different from the rank of dyn.pl.kinetic_heatmap. I don't know what's the difference between these two top genes? btw, I want to save the transition_graph, but it gave an error as below:

pickle.dump(transition_graph,open('transition_graph.p','wb'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [342], in <cell line: 1>()
----> 1 pickle.dump(transition_graph,open('transition_graph.p','wb'))

AttributeError: Can't pickle local object 'vecfld_from_adata.<locals>.<lambda>'

lh12565 avatar Mar 18 '22 08:03 lh12565

the cell type to color map is this: c=("black" if cur_ftype == -1 else "blue" if cur_ftype == 0 else "red"), see this line of code

https://github.com/aristoteleo/dynamo-release/blob/de6c6e737fb10177ff97fbb8179a1b0139c836d2/dynamo/plot/topography.py#L390

so type is -1 is an stable fixed point (attractor) and will be black, if it is 0, it will be blue, and 1 (and others) will be red.

I think we need to clarify that only if the type is -1, it corresponds to an attractor (stable fixed point). Otherwise they are unstable fixed points. So you are right, HSC is an unstable fixed point but not an attractor.

Let me know if this is clear to you.

@dummyindex can you please make sure the Lap tutorial and zebrafish tutorial consistent?

The variable name "attractors" indeed causes confusion.

dummyindex avatar Mar 19 '22 05:03 dummyindex

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Sep 01 '22 00:09 github-actions[bot]