MuSiC2 icon indicating copy to clipboard operation
MuSiC2 copied to clipboard

CpG coverage warning + uninitialized value warning

Open jasonptm opened this issue 3 years ago • 1 comments

Hello,

We've been working on running MuSiC2 on some dog cancer samples to find significantly mutated genes. There's a couple of types of warnings we've been seeing that we'd like to get advice on.

We have been getting warnings of the following type:

#More CpG_Transitions seen in ENSCAFG00000001106 than there are bps with sufficient coverage!
#More CpG_Transversions seen in ENSCAFG00000001136 than there are bps with sufficient coverage!

When we looked at the CpG counts in the coverage files we saw that they were always zero. When we examined the subprogram calcRoiCovg, if we swapped the inputs such that CpGs were counted before CGs the CpG counts were no longer zero, e.g. when we ran the parallelized calc-covg step with

--bp-class-types=AT,CpG,CG

instead of

--bp-class-types=AT,CG,CpG

then CpG counts would not always have zero values. Additionally, for each ROI the new CG count plus the new CpG count equals the old CG count, e.g.

#Gene	ROI	Length	Covered	ATs_Covered	CGs_Covered	CpGs_Covered
Original output:
ENSCAFG00000000001	chr1:252393-252564	172	172	96	76	0

New output (columns swapped back for better comparison):
ENSCAFG00000000001	chr1:252393-252564	172	172	96	74	2

When we swapped the resulting (non-zero) columns back into the order expected by MuSiC2 and continued running from calc-covg using the files generated, we no longer saw the above warning.

We are not sure why this is happening. However if calcRoiCovg counts the categories in order and then removes those basepairs from consideration, that would be consistent with what we see (e.g. CpG basepairs are being counted as two CGs before CpGs are checked for).

We are further seeing the following warnings:

Use of uninitialized value in addition (+) at /usr/local/share/perl/5.26.1/TGI/MuSiC2/CalcBmr.pm line 452.
Use of uninitialized value $muts_in_class in subtraction (-) at /usr/local/share/perl/5.26.1/TGI/MuSiC2/CalcBmr.pm line 456.
Use of uninitialized value $muts_in_class in division (/) at /usr/local/share/perl/5.26.1/TGI/MuSiC2/CalcBmr.pm line 457.

We see these warnings regardless of the state of the CpG counts (e.g. we do not think they are caused by the change we made described above). Looking at the code, we think that these warnings are happening because gene_mr is no longer being initialized to zeroes, e.g. lines 248-253 of CalcBmr.pm are commented out. This means that at lines 452 and 455 (which sets muts_in_class) sometimes gene_mr will be of type undef (which may cause muts_in_class to be undef). Since the code is trying to use undef in arithmetic it will print a warning. We don't think this causes any functional problems since Perl will regard undef as zero as desired; mostly we just want to confirm that this warning isn't a symptom that something else is wrong.


To reiterate, the questions are:

  1. Does our fix for the CpG warning sound correct? If not, is this because the described workaround will cause other problems, or because we have misdiagnosed the underlying cause of the warnings?
  2. Does our analysis of the uninitialized value warnings sound correct, and therefore we can safely ignore them?

Thanks!

jasonptm avatar Feb 03 '21 16:02 jasonptm