MultiVelo icon indicating copy to clipboard operation
MultiVelo copied to clipboard

how to create DTW ?

Open SinghalDivya opened this issue 1 year ago • 7 comments

Hi,

could you please explain how can I carry run DTW time delay analysis across modalities? as mentioned in the paper, I ran RunChromVAR but what's the input file used in example code? https://github.com/welch-lab/MultiVelo/blob/main/Examples/dtw_example.R

Thank you

SinghalDivya avatar Aug 23 '23 12:08 SinghalDivya

Hi,

could you please suggest me how can I create figure 6E-G from the paper https://www.nature.com/articles/s41587-022-01476-y/figures/6.

Thanks

SinghalDivya avatar Aug 28 '23 13:08 SinghalDivya

The input of DTW script is a file with three columns: latent time, motif accessibility, and TF gene expression. The accessibilities and expressions of genes of interest can be extracted from ATAC and RNA anndata objects respectively. The latent time is in the adata_rna.obs['latent_time'] field after computing latent time.

The paper's method section has more detailed descriptions of how Figure 6e-g were generated. For Fig6f, basically the DTW analyses were done for all TFs individually and summarized to a dataframe, then the figure was generated with ggplot like this

ggplot(data = diff_df[1:bins,], aes(x=t)) + 
  geom_line(aes(y = X0., color = 'X0.'), size=1, alpha=0.4) + 
  geom_line(aes(y = X25., color = 'X25.'), size=1.5, alpha=0.8) + 
  geom_line(aes(y = median, color = 'median'), size=2) + 
  geom_line(aes(y = X75., color = 'X75.'), size=1.5, alpha=0.8) + 
  geom_line(aes(y = X100., color = 'X100.'), size=1, alpha=0.4) + 
  geom_line(aes(y = zeros), color='black', size=1, linetype = "dashed") + 
  geom_ribbon(aes(ymin = X0., ymax = X25.), fill = '#5DB3FF', alpha=0.2) +
  geom_ribbon(aes(ymin = X25., ymax = median), fill = '#74E3AA', alpha=0.5) +
  geom_ribbon(aes(ymin = median, ymax = X75.), fill = '#C4D857', alpha=0.5) +
  geom_ribbon(aes(ymin = X75., ymax = X100.), fill = '#FF964D', alpha=0.2) +
  theme_bw() + xlab('Latent time') + ylab('Δt') +
  theme(axis.text = element_text(size=10), axis.title = element_text(size=12), legend.position = c(0.87, 0.87), 
        legend.background = element_rect(fill=NA), legend.text = element_text(size=10)) + 
  scale_color_manual("", 
                     breaks = c("X100.", "X75.", "median", "X25.", "X0."),
                     labels = c("q100", "q75", "q50", "q25", "q0"),
                     values = c("#ff5e42", "#ffcd58", "#89e255", "#5ee4ff", "#5c82ff"))

Fig6g was analyzed the same way but for SNPs. After saving all time lags to df, the plot was generated with ggplot as

ggplot(df, aes(x = snp_time, y = max_lags, color=log_count)) + geom_point(size=1, alpha=0.9) + 
  xlab('Max SNP accessibility time') + ylab('Δt') + 
  scale_color_viridis_c() + theme_bw() + theme(axis.text = element_text(size=10), axis.title = element_text(size=12))

danielee0707 avatar Aug 28 '23 19:08 danielee0707

Hi,

Thank you so much for your response. Could you please also give me an example on how can I extract accessibilities and expressions of genes of interest can be extracted from ATAC and RNA anndata objects. Is there any sample input file for DTW analysis?

Thanks again!

On Aug 28, 2023, at 3:54 PM, Chen Li @.***> wrote:

The input of DTW is a file with three columns: latent time, motif accessibility, and TF gene expression. The accessibilities and expressions of genes of interest can be extracted from ATAC and RNA anndata objects respectively. The latent time is in the adata_rna.obs['latent_time'] field after computing latent time.

The paper's method section has more detailed descriptions of how Figure 6e-g were generated. For Fig6f, basically the DTW analyses were done for all TFs and summarized to a dataframe, then the figure was generated with ggplot like this

ggplot(data = diff_df[1:bins,], aes(x=t)) + geom_line(aes(y = X0., color = 'X0.'), size=1, alpha=0.4) + geom_line(aes(y = X25., color = 'X25.'), size=1.5, alpha=0.8) + geom_line(aes(y = median, color = 'median'), size=2) + geom_line(aes(y = X75., color = 'X75.'), size=1.5, alpha=0.8) + geom_line(aes(y = X100., color = 'X100.'), size=1, alpha=0.4) + geom_line(aes(y = zeros), color='black', size=1, linetype = "dashed") + geom_ribbon(aes(ymin = X0., ymax = X25.), fill = '#5DB3FF', alpha=0.2) + geom_ribbon(aes(ymin = X25., ymax = median), fill = '#74E3AA', alpha=0.5) + geom_ribbon(aes(ymin = median, ymax = X75.), fill = '#C4D857', alpha=0.5) + geom_ribbon(aes(ymin = X75., ymax = X100.), fill = '#FF964D', alpha=0.2) + theme_bw() + xlab('Latent time') + ylab('Δt') + theme(axis.text = element_text(size=10), axis.title = element_text(size=12), legend.position = c(0.87, 0.87), legend.background = element_rect(fill=NA), legend.text = element_text(size=10)) + scale_color_manual("", breaks = c("X100.", "X75.", "median", "X25.", "X0."), labels = c("q100", "q75", "q50", "q25", "q0"), values = c("#ff5e42", "#ffcd58", "#89e255", "#5ee4ff", "#5c82ff"))

Fig6g was analyzed the same way but for SNPs. After saving all time lags to df, the plot was generated with ggplot with

ggplot(df, aes(x = snp_time, y = max_lags, color=log_count)) + geom_point(size=1, alpha=0.9) + xlab('Max SNP accessibility time') + ylab('Δt') + scale_color_viridis_c() + theme_bw() + theme(axis.text = element_text(size=10), axis.title = element_text(size=12))

— Reply to this email directly, view it on GitHubhttps://github.com/welch-lab/MultiVelo/issues/32#issuecomment-1696309595, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGLFXDCR4XPXYKWBZXXJNGDXXTZQJANCNFSM6AAAAAA33NY6PY. You are receiving this because you authored the thread.Message ID: @.***>

SinghalDivya avatar Aug 31 '23 18:08 SinghalDivya

For each defined target and motif pair you can use a script similar to this to save the accessibility and expression. Depending on your specific goal (e.g. chromatin priming), the order of accessibility and expression may be flipped.

motif_activity = sc.read_csv('motifs/chromvar_z.csv')  # motif accessibility computed using chromvar
motif_activity = motif_activity.transpose()
motif_activity = motif_activity[adata_result.obs_names,:]
mv.knn_smooth_chrom(motif_activity, nn_idx, nn_dist)

for i,gene in enumerate(targets):
    t = np.reshape(np.array(adata_result.obs['latent_time']), (-1,1))
    m = np.array(motif_activity[:,motif_names[i]].layers['Mc'])
    r = np.array(adata_result[:,gene].X)
    res = np.hstack((t,m,r))
    np.savetxt('motifs/dtw/'+gene+'_res.txt', res, fmt='%f')

danielee0707 avatar Sep 03 '23 17:09 danielee0707

Thank you for guiding me through the process. as suggested, I ran through the code. although I stumble upon an error:

motif_activity = sc.read_csv('./Motifname_chromvar_score_2520.csv') # motif accessibility computed using chromvar motif_activity = motif_activity.transpose() motif_activity = motif_activity[adata_result.obs_names,:] mv.knn_smooth_chrom(motif_activity, nn_idx, nn_dist)

####### error ####### Screenshot 2023-09-09 at 1 38 47 AM

####### code used

motif_activity AnnData object with n_obs × n_vars = 4060 × 746 layers: 'Mc' obsp: 'connectivities' ####### calculated for motif (m) Screenshot 2023-09-09 at 1 33 17 AM

calculated for r and t
Screenshot 2023-09-09 at 1 34 29 AM

########### expression data is getting stored as sparse matrix which is 2 dimension, rest values of m and t are one dimension, if I am not wrong. could you suggest me way around this error?

Thanks again!

SinghalDivya avatar Sep 09 '23 05:09 SinghalDivya

I think the issue here is due to adata.X being a sparse matrix. You can try r = np.array(adata_result[:,gene].X.A) instead and see if it works out.

danielee0707 avatar Sep 11 '23 03:09 danielee0707

Perfect, thank you it worked!

SinghalDivya avatar Sep 11 '23 04:09 SinghalDivya