mocha
mocha copied to clipboard
eLRR: Clarifcation on LRR adjustments
Hi there,
I was curious if you can provide details on the type of LRR adjustments that are made to produce the eLRR plots?
Cheers,
Ryan
If you look in the code for mocha_plot.R
you will find this:
for (gt in c('AA', 'AB', 'BB')) {
idx <- df$gts == gt
df$BAF[idx] <- df$BAF[idx] - df[idx, paste0(gt, '_BAF1')] * df$LRR[idx] - df[idx, paste0(gt, '_BAF0')]
df$LRR[idx] <- df$LRR[idx] - df[idx, paste0(gt, '_LRR0')]
}
The MoChA output VCF will include the following nine variables:
AA_LRR0
AA_BAF0
AA_BAF1
AB_LRR0
AB_BAF0
AB_BAF1
BB_LRR0
BB_BAF0
BB_BAF1
These explain how to adjust LRR and BAF for each genotype using the following formulas:
BAF = BAF - BAF1 * LRR - BAF0
LRR = LRR - LRR0
After that you have an adjustment based on GC content by extracting information from the .stats.tsv
file through the following code:
df_stats <- read.table(args$stats, sep = '\t', header = TRUE)
lrr_gc_order <- sum(grepl('^lrr_gc_[0-9]', names(df_stats))) - 1
df <- merge(df, df_stats[, c('sample_id', paste0('lrr_gc_', 0:lrr_gc_order))])
for (i in 0:lrr_gc_order) {
df$LRR <- df$LRR - as.numeric(df$gc)^i * df[, paste0('lrr_gc_', i)]
}
This means that LRR will be further adjusted as follows:
LRR = LRR - LRR_GC_0 - GC * LRR_GC_1 - GC*GC * LRR_GC_2 - ...
Where the list is as long as the degree of the polynomial used for the GC correction
Thanks for the clarifications! Would there be a case in which the eLRR picks up a signal yet the unadjusted LRR looks fine?
Yeah, of course in theory that is possible and it could be desirable if it comes from a real signal
I am attempting to troubleshoot this given call (14q11):
This exact call is present in many unrelated samples and thus we're suspicious of a false-positive, especially given the fact it is not apparent in the unadjusted LRR (and BAF). It is isolated to one single batch of samples. Dropping the missingness threshold does not affect the call.
We are hoping to understand these types of calls more deeply. Any guidance on how to further troubleshoot would be appreciated. Thanks.
Just following up on this. Any guidance on how to further troubleshoot?
Cheers,
Ryan
Visually something does seem to be going on across multiple consecutive markers so it is hard to argue that MoChA is doing anything wrong. What explains that might not be a CNV but I am not an expert on LRR. There does not seem to be any BAF signal in this call. You could try to filter out germline duplications based solely on LRR but without further testing and investigating I don't have further advice as you would have to try to understand what exactly is going on with these regions. Do you think the GC correction is at fault here? Do the affected markers have outlier GC values? Are you using the newest version of MoChA?