superFreq icon indicating copy to clipboard operation
superFreq copied to clipboard

CNV calls in Chr X and Y

Open mike8115 opened this issue 2 years ago • 3 comments

Hi Christoffer,

Thanks for making such a wonderful tool.

How does SuperFreq normalize the sex chromosomes when making CNV calls? When looking at the CNA plots, I'm seeing that males have a complete loss of Chr X and half of Chr Y. I do have more females in my panel of normals, but I don't know if that would affect the underlying logic.

Best, Michael

mike8115 avatar Dec 14 '21 01:12 mike8115

Hey Michael!

SuperFreq calls absolute copy numbers, so without copy number alterations in the sample, it should call 1 copy of X and 1 copy of Y for a male, and 2 copies of X and 0 copies of Y for a female. Older males sometimes lose Y in part of the cells, so you sometimes see copy number calls between 1 and 0 copies in those cases.

It's done by first determining sex of the reference normals (through a ratio of read depth between X and Y, it's very clear in 99% of cases) and then adjust the counts to diploid (so multiplies the male X and Y by 2x, and discards the female Y). This is in the log files (in Rdirectory, runtimetracking.log).

Feel free to post the CNV plots if you want, and I'll see if I can make sense of it.

ChristofferFlensburg avatar Dec 15 '21 02:12 ChristofferFlensburg

Hey Christoffer!

Really sorry for the (super) late reply! This somehow got buried after the winter holidays.

The samples are all paediatric, so the biologists are telling me that they aren't expecting too much CNAs. Looking at the logs, I noticed that some of the normals was mislabeled as male. X3094e5dd.7a9e.4a00.8bbb.b853c59d0245 and X8bd2dfc0.1b41.4c3f.b7e7.a7544ef5e0e9 have really low scores compared to the male normals. Would this affect anything?

SAMPLE SCORE SEX BS_THDBV4G0 -0.06937308 female BS_B9QP40ER -0.08404726 female X088e4840.fffd.4db0.b09a.ee5468d8d44f 0.4020246 male X1a3e1ca8.7dda.45a8.8ddc.0ad296e55f0c -0.005031661 female X3094e5dd.7a9e.4a00.8bbb.b853c59d0245 0.02399775 male X4e579239.6629.431b.87bf.fd5b1c7f97cf 0.3709801 male X82ac9e41.5df6.43b5.8550.62bfe9f7efbd 0.4233964 male X8bd2dfc0.1b41.4c3f.b7e7.a7544ef5e0e9 0.02729819 male X96fe9edc.6464.4800.a0d9.3dd934912b6c 0.3986039 male a0bd77da.97ae.4f37.b7e7.93ea22555f7a -0.02119169 female d6c364f2.fdcf.47bb.b61a.ae3fb8770567 0.3946622 male e43935eb.65c2.460e.971f.59ad9edee403 -0.01606412 female

Here's the CNA summary plot!

Screenshot 2022-02-15 155313

mike8115 avatar Feb 15 '22 20:02 mike8115

Hey!

SuperFreq re-normalises the reference normals to diploid which the studied samples are then compared against. So superFreq mislabeling female samples as male, will incorrectly double the chrX counts used as reference. So that means that the copy numbers you see will be lower than expected for chrX. It seem that two of your 10 samples are mislabeled, so it's comparing to a total of 13 copies of chrX across the 10 reference normals, while it's actually 15 copies. So the chrX copy numbers will be a little lower than expected, about 13% lower. That will somewhat affect the overall ploidy normalisation of the test samples, but I think that should be a quite minor effect, a percent or two at most, and I would trust the copy number in the autosomal chromosomes if there are no other signs of problems.

From the heatmap it looks like the CNAs are fine, and only present in some samples, so this plot doesn't raise any flags to me. Seems you're hitting TP53 with loss or CNN LOH in all patients but one, which makes me trust the calls further. I bet they also have high-VAF point mutations! There's lots more QC data in the by-sample plots and you can go and look at the CNA plots that also shows the by-gene data overlaid on the segments, and more plots in the diagnostics sub-directories.

Maybe the biologists will be excited to hear that there seem to be a fair bit of copy number alterations, and they'll probably want to know what genes are affected. Apart form TP53, you can go into the plots directories, then data subdirectory, then the bySegment csv, open in excel, and look at last column which lists the COSMIC census genes in the segment. You can also look at the other cohort plots to maybe get some ideas of what genes may be recurrently hit.

ChristofferFlensburg avatar Feb 16 '22 05:02 ChristofferFlensburg