mocha icon indicating copy to clipboard operation
mocha copied to clipboard

eLRR: Clarifcation on LRR adjustments

Open ryan-ed-bailey opened this issue 1 year ago • 6 comments

Hi there,

I was curious if you can provide details on the type of LRR adjustments that are made to produce the eLRR plots?

Cheers,

Ryan

ryan-ed-bailey avatar Feb 26 '24 20:02 ryan-ed-bailey

If you look in the code for mocha_plot.R you will find this:

for (gt in c('AA', 'AB', 'BB')) {
  idx <- df$gts == gt
  df$BAF[idx] <- df$BAF[idx] - df[idx, paste0(gt, '_BAF1')] * df$LRR[idx] - df[idx, paste0(gt, '_BAF0')]
  df$LRR[idx] <- df$LRR[idx] - df[idx, paste0(gt, '_LRR0')]
}

The MoChA output VCF will include the following nine variables:

AA_LRR0
AA_BAF0
AA_BAF1
AB_LRR0
AB_BAF0
AB_BAF1
BB_LRR0
BB_BAF0
BB_BAF1

These explain how to adjust LRR and BAF for each genotype using the following formulas:

BAF = BAF - BAF1 * LRR - BAF0
LRR = LRR - LRR0

After that you have an adjustment based on GC content by extracting information from the .stats.tsv file through the following code:

df_stats <- read.table(args$stats, sep = '\t', header = TRUE)
lrr_gc_order <- sum(grepl('^lrr_gc_[0-9]', names(df_stats))) - 1
df <- merge(df, df_stats[, c('sample_id', paste0('lrr_gc_', 0:lrr_gc_order))])
for (i in 0:lrr_gc_order) {
  df$LRR <- df$LRR - as.numeric(df$gc)^i * df[, paste0('lrr_gc_', i)]
}

This means that LRR will be further adjusted as follows:

LRR = LRR - LRR_GC_0 - GC * LRR_GC_1 - GC*GC * LRR_GC_2 - ...

Where the list is as long as the degree of the polynomial used for the GC correction

freeseek avatar Feb 26 '24 20:02 freeseek

Thanks for the clarifications! Would there be a case in which the eLRR picks up a signal yet the unadjusted LRR looks fine?

ryan-ed-bailey avatar Feb 26 '24 21:02 ryan-ed-bailey

Yeah, of course in theory that is possible and it could be desirable if it comes from a real signal

freeseek avatar Feb 26 '24 22:02 freeseek

I am attempting to troubleshoot this given call (14q11):

image

This exact call is present in many unrelated samples and thus we're suspicious of a false-positive, especially given the fact it is not apparent in the unadjusted LRR (and BAF). It is isolated to one single batch of samples. Dropping the missingness threshold does not affect the call.

We are hoping to understand these types of calls more deeply. Any guidance on how to further troubleshoot would be appreciated. Thanks.

ryan-ed-bailey avatar Feb 26 '24 22:02 ryan-ed-bailey

Just following up on this. Any guidance on how to further troubleshoot?

Cheers,

Ryan

ryan-ed-bailey avatar Apr 02 '24 23:04 ryan-ed-bailey

Visually something does seem to be going on across multiple consecutive markers so it is hard to argue that MoChA is doing anything wrong. What explains that might not be a CNV but I am not an expert on LRR. There does not seem to be any BAF signal in this call. You could try to filter out germline duplications based solely on LRR but without further testing and investigating I don't have further advice as you would have to try to understand what exactly is going on with these regions. Do you think the GC correction is at fault here? Do the affected markers have outlier GC values? Are you using the newest version of MoChA?

freeseek avatar Apr 03 '24 06:04 freeseek