ExomeDepth
ExomeDepth copied to clipboard
Deletions with more reads than expected
Hi I just noticed an issue where regions with more reads than expected are being called as deletions (reads.ratio >> 1). In the example there's also an issue with overlapping regions being called as both deletions and duplications. Anyone have any ideas on what might be happening here?
124085,124093,"deletion",9,55142239,55156999,"chr7","chrchr7:55142239-55156999",26.5,4474,17460,3.9
124085,124113,"duplication",20,55142239,55201432,"chr7","chrchr7:55142239-55201432",57.8,12671,45280,3.57
124085,124126,"deletion",13,55142239,55366179,"chr7","chrchr7:55142239-55366179",115,20719,81052,3.91
124085,124134,"duplication",8,55142239,55431346,"chr7","chrchr7:55142239-55431346",85.4,23742,90328,3.8
124085,124136,"deletion",2,55142239,55433799,"chr7","chrchr7:55142239-55433799",132,25303,97612,3.86
124085,124187,"duplication",51,55142239,55795890,"chr7","chrchr7:55142239-55795890",178,45707,146585,3.21
Best regards -Harald
That does not seem right! Do you have a plot perhaps showing the distribution of the read count over the chromosome region? Something seems to be confusing the HMM.
Here's the read coverage in the region (max is 5230). I'm using 50 samples for reference, 49 has about 1/10th of the read depth and one sample has 1.5x the number of reads. The total number of reads is about the same for all samples.
I'm using version 1.1.10 and same settings as in the tutorial.
I have observed this happening too and from what I could determine it appeared to occur in areas where there is known paralogy.
However this also happened when testing a sample where there is a known duplication followed directly by a known deletion. ExomeDepth was able to call the duplication correctly and it also made a deletion call except it started at the same position as the duplication and continued to cover the known deletion. The resulting deletion call ended up with a BF of -642 and a ratio of 1.22.
If I subtract the expected and observe reads of the correctly called duplication from those of the deletion call which should just give the expected/observed values for the additional intervals, the ratio is 0.52 which supports the deletion call that I was expecting.
This can be solved by changing line 294 of class_definition.R from
my.calls$calls$start.p <- my.calls$calls$start.p -1
to
my.calls$calls$start.p <- my.calls$calls$end.p - my.calls$calls$nexons