leafcutter PSI and covariates

Hello, I was wondering if there is a way to get additional output when adjusting for covariates, so that it is possible to:

output the PSI values at the sample level after covariate fitting? Right now it only seems possible to output adjusted PSI values per group or otherwise raw PSI values per sample; and
use it for the PCA plots in leafviz. Also, it looks like that when using covariates, the PSI values in leafviz (i.e. psi given per group for a significantly DS cluster) aren't the fitted PSI values.

Thanks!

Sep 06 '17 19:09 ddpinto

Hi Dalila, Great idea! For 2. I'll implement covariate PCAs for LeafViz.

Sep 06 '17 20:09 jackhump

This is true. What's your use case? Other downstream analysis like PCA or clustering? I'm assuming you want something like residuals (after removing technical covariates) rather than fitted values (since these would just be the group level PSIs). The Dirichlet-Multinomial model doesn't currently do this, but I can make a version that does if that would be helpful. The simpler and maybe more pragmatic solution would be to make a phenotype matrix like @goldenflaw's 'prepare_phenotype_table.py' (or the code below!) and then regress out technical confounders from that.
@jackhump I've realized the PCA is currently on the raw counts whereas it should really be on the ratios (otherwise the PCA is confounded with total expression variation). I would suggest this for getting the ratios:

ratios = counts %>% 
  mutate(clu = str_split_fixed(rownames(counts), ":", 4)[,4]) %>%
  group_by(clu) %>% 
  mutate_all( funs( ./sum(.) ) ) %>% 
  ungroup() %>%
  as.data.frame() %>% 
  set_rownames(rownames(counts)) %>% 
  select(-clu)
ratios = ratios[rowMeans(is.na(ratios)) <= 0.4,,drop=F ]
row_means = rowMeans(ratios, na.rm = T)
row_means_outer = outer(row_means, rep(1,ncol(ratios)))
ratios[is.na(ratios)] = row_means_outer[is.na(ratios)]

We'll need to add the 'stringr' dependency for 'str_split_fixed'.

Sep 07 '17 07:09 davidaknowles

thanks for that David. I think stringr should already be a listed dependency. The function set_rownames comes from magrittr so you'll need to add that if it isn't already.

Sep 07 '17 10:09 jackhump

that's now done. I've deployed the latest changes to https://leafcutter.shinyapps.io/leafviz/

Sep 07 '17 14:09 jackhump

Great, thanks. I've added magrittr and stringr as new dependencies for the LeafCutter package.

Sep 07 '17 19:09 davidaknowles

Hello, Just following up here as well (some of it went earlier by email but not sure if you got those).

Yes getting the residuals after regressing out technical & biological covariates. I'm doing case/control analyses and my covariate model was derived from the gene expression counts and it includes biological and technical covariates (e.g. sex, RIN, tissue, 10 sequencing PCs plus 7 SVs).

The idea would be to use the ratios (PSI values) at sample level adjusted for covariates in order to:

examine the effect of regressing out covariables using PCA (i.e. PCA before and after regressing for covariates);
show the adjusted PSI values per sample in the significantly DS intron cluster plots in leafviz;
produce heatmaps of adjusted PSI values for DS clusters in different groups, or simply plot the distribution of PSI values for differentially spliced intron clusters of a gene.

Thanks!

Sep 19 '17 15:09 ddpinto

I see. So this is definitely possible with the existing DM-GLM but will require a little into exactly what the calculation should be. I'll prototype something and maybe you (Dalila) can see if it looks reasonable on your data.

Sep 22 '17 21:09 davidaknowles

I've made a branch psi with PSI calculation (with or without adjusting for confounders) under the DM-GLM. This requires an extra optimization step for each cluster and is separate from the differential splicing code. You can install this branch using

devtools::install_github("davidaknowles/leafcutter/leafcutter",ref="psi")

There is a new script scripts/leafcutter_quantify_psi.R which takes the counts file and (optionally) a file with confounders to be removed. I've tested this on 10 v 10 samples from GTEx, pretending tissue is a confounder. Without removing tissue: With tissue removed:

Sep 23 '17 05:09 davidaknowles

Great! I'll test this out on my data. Btw I've tested the smart init after regularization and it works for me as well.

Sep 24 '17 23:09 ddpinto

Cool, good to know. Let me know if you have any problems!

On Sep 24, 2017 4:09 PM, "Dalila Pinto" [email protected] wrote:

Great! I'll test this out on my data. Btw I've tested the the smart init after regularization and it works for me as well.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/34#issuecomment-331746680, or mute the thread https://github.com/notifications/unsubscribe-auth/AET_BuNFYuCJjDGYuxoHzjQpYH41UNEMks5sluEQgaJpZM4PO3wb .

Sep 25 '17 00:09 davidaknowles

Any update on whether this works for you?

Oct 01 '17 17:10 davidaknowles

Everything now runs to completion, although it is hard to fully test since it is hard to know what is correct/incorrect. To make an heatmap would you suggest to include only the most differentially spliced intron per intron cluster?

When possible, could you perhaps move the PSI branch to main?

Oct 10 '17 13:10 ddpinto

True. The hope would be that in PCA space differences between biological groups would look more pronounced after removing confounding variation. Are you able to look at that with your data?

Yes, that's what we did when looking at the GTEx samples.

I'll merge into main, just wanted someone else to have tried it out first.

On Tue, Oct 10, 2017 at 6:07 AM, Dalila Pinto [email protected] wrote:

Everything now runs to completion, although it is hard to fully test since it is hard to know what is correct/incorrect. To make an heatmap would you suggest to include only the most differentially spliced intron per intron cluster?

When possible, could you perhaps move the PSI branch to main?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/34#issuecomment-335466419, or mute the thread https://github.com/notifications/unsubscribe-auth/AET_BkIepvlHoaHSzF8umWeLCnqKkqleks5sq2wrgaJpZM4PO3wb .

-- Dr. David A. Knowles, Postdoctoral Scholar, Pritchard and Plevritis Labs, Stanford University. E-mail: [email protected] Web: https://davidaknowles.github.io/

Oct 10 '17 18:10 davidaknowles

Yes I'm on it, and will get back as soon as possible.

Oct 12 '17 18:10 ddpinto

Hello, I've had a chance to look at the correction and found it works well for removing confounders. The plot below shows the effect of removing site effects: correction_pca

Another thing I played around with are the cluster plots in leafviz. By default it loads the uncorrected data, which means that the PSI values that are shown don't reflect the data after accounting for confounders. I did some crude hacking of the 'prepare_results' script to feed it the corrected ratio data rather than perind_numers.counts, and this shows that the post-correction data better reflect actual differences: correction_psi-cluster-plot

Running the correction as a separate script works well, though it would be nice to have the option in leafviz to visualize the PCA and clusters before and after covariate correction. Not sure how much of a hassle that would be to implement though.

Thanks!

Nov 01 '17 16:11 ddpinto

is the "leafcutter_quantify_psi.R" script available anywhere?

Nov 15 '21 15:11 herber4

script is here: https://github.com/davidaknowles/leafcutter/blob/psi_2019/scripts/leafcutter_quantify_psi.R

Mar 29 '24 16:03 jackhump

Thank you!

Mar 29 '24 16:03 seyoun209