mocha eLRR: Clarifcation on LRR adjustments

Hi there,

I was curious if you can provide details on the type of LRR adjustments that are made to produce the eLRR plots?

Cheers,

Ryan

Feb 26 '24 20:02 ryan-ed-bailey

If you look in the code for mocha_plot.R you will find this:

for (gt in c('AA', 'AB', 'BB')) {
  idx <- df$gts == gt
  df$BAF[idx] <- df$BAF[idx] - df[idx, paste0(gt, '_BAF1')] * df$LRR[idx] - df[idx, paste0(gt, '_BAF0')]
  df$LRR[idx] <- df$LRR[idx] - df[idx, paste0(gt, '_LRR0')]
}

The MoChA output VCF will include the following nine variables:

AA_LRR0
AA_BAF0
AA_BAF1
AB_LRR0
AB_BAF0
AB_BAF1
BB_LRR0
BB_BAF0
BB_BAF1

These explain how to adjust LRR and BAF for each genotype using the following formulas:

BAF = BAF - BAF1 * LRR - BAF0
LRR = LRR - LRR0

After that you have an adjustment based on GC content by extracting information from the .stats.tsv file through the following code:

df_stats <- read.table(args$stats, sep = '\t', header = TRUE)
lrr_gc_order <- sum(grepl('^lrr_gc_[0-9]', names(df_stats))) - 1
df <- merge(df, df_stats[, c('sample_id', paste0('lrr_gc_', 0:lrr_gc_order))])
for (i in 0:lrr_gc_order) {
  df$LRR <- df$LRR - as.numeric(df$gc)^i * df[, paste0('lrr_gc_', i)]
}

This means that LRR will be further adjusted as follows:

LRR = LRR - LRR_GC_0 - GC * LRR_GC_1 - GC*GC * LRR_GC_2 - ...

Where the list is as long as the degree of the polynomial used for the GC correction

Feb 26 '24 20:02 freeseek

Thanks for the clarifications! Would there be a case in which the eLRR picks up a signal yet the unadjusted LRR looks fine?

Feb 26 '24 21:02 ryan-ed-bailey

Yeah, of course in theory that is possible and it could be desirable if it comes from a real signal

Feb 26 '24 22:02 freeseek

I am attempting to troubleshoot this given call (14q11):

This exact call is present in many unrelated samples and thus we're suspicious of a false-positive, especially given the fact it is not apparent in the unadjusted LRR (and BAF). It is isolated to one single batch of samples. Dropping the missingness threshold does not affect the call.

We are hoping to understand these types of calls more deeply. Any guidance on how to further troubleshoot would be appreciated. Thanks.

Feb 26 '24 22:02 ryan-ed-bailey

Just following up on this. Any guidance on how to further troubleshoot?

Cheers,

Ryan

Apr 02 '24 23:04 ryan-ed-bailey

Visually something does seem to be going on across multiple consecutive markers so it is hard to argue that MoChA is doing anything wrong. What explains that might not be a CNV but I am not an expert on LRR. There does not seem to be any BAF signal in this call. You could try to filter out germline duplications based solely on LRR but without further testing and investigating I don't have further advice as you would have to try to understand what exactly is going on with these regions. Do you think the GC correction is at fault here? Do the affected markers have outlier GC values? Are you using the newest version of MoChA?

Apr 03 '24 06:04 freeseek

mocha mocha copied to clipboard

eLRR: Clarifcation on LRR adjustments

mocha
mocha copied to clipboard