facets
facets copied to clipboard
Weird regions of homozygous deletion
I would like to understand a behaviour that seems to occur quite often with facets.
Here I show an example taken from a WES paired samples analysis.
rcmat = readSnpMatrixDT(snpfile)
# Pre-processing
xx = preProcSample(rcmat, cval = 10, ndepth = 40, het.thresh=0.15, gbuild ='hg38', snp.nbhd = 250, deltaCN = 0, hetscale = FALSE)
# First round (high cval) to infer diploid level
oo1 = procSample(xx, cval = 300, min.nhet = 10, dipLogR = NULL)
fit1 = emcncf(oo1, trace=TRUE, unif=FALSE, min.nhet=10, maxiter=20, eps=1e-3)
# Second round (less stringent cval) and forcing diploid level from previous round for final model
oo2 = procSample(xx, cval = 150, min.nhet = 10, dipLogR = oo1$dipLogR)
fit2 = emcncf(oo2, trace=TRUE, unif=FALSE, min.nhet=10, maxiter=20, eps=1e-3)
# Plotting
purity=round(100*fit2$purity,3)
ploidy=round(fit2$ploidy,2)
text_title=paste("Fit2 - Purity=", purity, "%; Ploidy=", ploidy, sep="")
plotSample(x=oo2, emfit=fit2, sname=text_title)
I have attached the final profile and here is an excerpt of fit2$cncf
object (chr1 to chr8):
chrom seg num.mark nhet cnlr.median mafR segclust cnlr.median.clust mafR.clust start end cf.em tcn.em lcn.em
1 1 1 1274 98 0.10467567 0.0092296398 26 0.10006393 1.411796e-02 69610 10796635 0.3694622 3 1
2 1 2 7921 698 -0.15563691 0.0055080560 18 -0.13703726 2.790738e-03 10946611 159841149 1.0000000 2 1
3 1 3 5 2 0.35704606 -0.0462150825 33 0.38770348 NA 159854415 159858823 0.1277823 6 NA
4 1 4 3705 397 -0.19240227 -0.0015385511 15 -0.17879937 1.210780e-02 159872589 234373840 1.0000000 2 1
5 1 5 207 9 0.04765994 0.0010413871 25 0.03524168 NA 234393700 239678259 NA 2 NA
6 1 6 365 59 -0.69593747 1.8134798995 5 -0.69713223 1.813480e+00 239719265 248917704 0.7462907 1 0
7 2 7 8648 758 -0.12849425 0.0011245450 18 -0.13703726 2.790738e-03 41602 241253466 1.0000000 2 1
8 3 8 3939 345 -0.51095429 0.2017021050 8 -0.48732228 2.053418e-01 197118 85979245 0.4428092 1 0
9 3 9 3157 314 -0.12614004 -0.0014364067 18 -0.13703726 2.790738e-03 86937990 195774291 1.0000000 2 1
10 3 10 27 11 -0.71312736 0.0269026620 6 -0.69713223 2.690266e-02 195779039 195790387 0.3156750 0 0
11 3 11 172 17 0.04626998 -0.0248606166 24 0.03524168 4.804142e-05 195790682 198120832 0.1245663 4 2
12 4 12 548 55 -0.13386044 0.0089008243 18 -0.13703726 2.790738e-03 86054 3768314 1.0000000 2 1
13 4 13 3154 302 -0.34303241 0.0107392386 11 -0.34303241 1.073924e-02 3889882 167011252 0.1321878 0 0
14 4 14 380 34 -0.17127855 -0.0115668242 15 -0.17879937 1.210780e-02 167234027 189984136 1.0000000 2 1
15 5 15 3079 249 -0.10319288 0.0132801021 20 -0.09278793 2.126449e-03 140601 140705696 1.0000000 2 1
16 5 16 83 20 0.92763720 -0.0135349006 42 0.91157160 -1.353490e-02 140786241 140882583 0.5337619 6 3
17 5 17 264 49 0.59982938 0.0041763946 36 0.63009416 4.176395e-03 140883565 141491858 0.5430281 4 2
18 5 18 1998 175 -0.15175424 -0.0075482104 18 -0.13703726 2.790738e-03 141494781 181263936 1.0000000 2 1
19 6 19 6417 665 -0.08913815 -0.0072917310 20 -0.09278793 2.126449e-03 203548 170582241 1.0000000 2 1
20 7 20 734 77 0.49076343 0.1193743586 34 0.50596861 1.193744e-01 193673 6880279 0.2801692 6 2
21 7 21 5307 496 0.15304819 0.0700708916 27 0.16950833 8.648980e-02 6923825 149850370 0.2339899 4 1
22 7 22 523 60 0.38827018 0.0106956280 32 0.38770348 1.069563e-02 149859313 159232440 0.4310592 4 2
23 8 23 145 23 0.10989925 0.0117815646 26 0.10006393 1.411796e-02 213461 3219289 0.3694622 3 1
24 8 24 3482 351 -0.18225403 0.0018421828 15 -0.17879937 1.210780e-02 3230165 143158897 1.0000000 2 1
25 8 25 692 32 0.18356769 -0.0318314862 28 0.16950833 2.936435e-04 143213308 145056198 0.2427879 4 2
The issue here is with chr4 status of homozygous deletion.
chr3 and chr8 display segments of heterozygous deletion (tcn, lcn)=(1,0) with different values of logR
and logOR
. If I understand this well, the difference is explained by the fact that the cellular fraction of both events is different, chr8 het del being more clonal than chr3 het del.
I guess that the interpretation here for chr4 is based on the same logic, meaning that chr4 homo del only happens in a very low proportion of tumour cells. But it seems to me that this region might also be merged to the normal (2,1) status even though there exists a non neglectable deviation from mean diploid logR level. What do you think ?
Extreme cases of this behaviour is when you have very noisy (wavy logR) data. In these cases you will observe over-segmentation of the data and applying a too low cval will end up in many (erroneous) short segments of homozygous deletion.
Considering all this, I wonder if you can elaborate a bit on the role of deltaCN
parameter in preprocessing step and on how to use it. Using deltaCN=1
has given me quite good (pretty) results but I am not exactly sure about what I am doing here.
Thanks.
# packages versions (install from bioconda)
data.table_1.12.2 facets_0.5.14 pctGCdata_0.2.0