pandora icon indicating copy to clipboard operation
pandora copied to clipboard

change in coverage mid-gene

Open LeahRoberts opened this issue 5 years ago • 1 comments
trafficstars

I have two Klebsiella pneumoniae isolates I'm looking at (KN0056-F and KN0056-L) that are basically identical according to snippy (4 SNPs different). I found this change in coverage reported in the Pandora output:

acs     759     .       GA      TA      .       .       SVTYPE=PH_SNPs;GRAPHTYPE=NESTED;AC=0;AN=2       GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:26,0:19,0:26,0:19,0:105,0:79,0:0,1:-6.07005,-267.233:261.163  0:42,0:32,0:43,0:32,0:171,0:131,0:0,1:-11.5429,-428.783:417.24
acs     801     .       C       A       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=0;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:21,0:15,0:21,0:15,0:87,0:61,0:0,1:-3.27659,-225.786:222.51    0:33,0:31,0:33,0:31,0:134,0:127,0:0,1:-6.98006,-382.731:375.751
acs     828     .       C       T       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=0;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:13,0:8,0:16,0:9,0:54,0:32,0:0.25,1:-11.455,-156.709:145.254   0:22,0:20,0:27,0:25,0:90,0:83,0:0.25,1:-13.8359,-281.417:267.581
acs     852     .       T       C       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=0;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:12,0:3,0:14,0:4,0:63,0:18,0:0,1:-6.88131,-129.078:122.196     0:22,0:19,0:24,0:19,0:113,0:95,0:0,1:-2.88244,-276.812:273.93
acs     900     .       A       T       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=2;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      1:0,2:0,0:0,2:0,0:0,6:0,0:1,1:-69.2103,-53.8908:15.3196 1:0,9:0,13:0,9:0,13:0,29:0,40:1,0:-189.314,-9.21901:180.095
acs     951     .       A       G       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=1;AN=1   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,0:1,1:-60,-60:0 1:0,1:0,3:0,1:0,3:0,6:0,15:1,0.5:-106.421,-54.0413:52.3794
acs     961     .       T       C       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=1;AN=1   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0:0,0:0,0:0,0:0,0:0,0:1,1:-60,-60:0 1:0,0:0,1:0,0:0,0:0,5:0,9:1,0.833333:-92.6052,-76.8825:15.7227
acs     996     .       A       G,T     .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=1,0;AN=1 GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      .:0,0,0:0,0,0:0,0,0:0,0,0:0,0,0:0,0,0:1,1,1:-60,-60,-60:0       1:0,4,0:0,5,0:0,6,0:0,8,0:0,14,0:0,16,0:1,0.333333,1:-129.447,-37.4108,-129.447:92.0357
acs     1112    .       C       T       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=0;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:22,0:27,0:22,0:27,0:45,0:54,0:0,1:-7.90707,-285.653:277.746   0:35,0:33,0:35,0:33,0:71,0:67,0:0,1:-8.63155,-401.152:392.52
acs     1143    .       C       T       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=0;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:30,0:31,0:32,0:31,0:120,0:126,0:0,1:-15.266,-340.915:325.649  0:27,0:27,0:36,0:35,0:111,0:108,0:0.25,1:-14.9739,-336.679:321.705
acs     1161    .       C       T       .       .       SVTYPE=SNP;GRAPHTYPE=NESTED;AC=0;AN=2   GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      0:30,0:32,0:32,0:32,0:152,0:161,0:0,1:-15.9919,-345.521:329.529 0:14,0:14,0:0,0:0,0:73,0:71,0:0.6,1:-32.3324,-216.945:184.612

When I look at this gene in the de novo assemblies, they are identical except at one position:

Query  1141  CGCATTCTTGGCTCGGTCGGCGAACCGATTAACCCGGAAGCCTGGGAGTGGTACTGGAAG  1200
             ||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||
Sbjct  1141  CGCATTCTTGGCTCGGTCGGCGAACCGATTAACCCGGAAGCCTAGGAGTGGTACTGGAAG  1200

There is a big drop in coverage for three positions where KN0056-F has not been genotyped (zero coverage). The biggest problem being that this flags as a difference between these isolates when there shouldn't be.

There are regions in this gene (1959 bp long) that look like they are repeated in both the isolate genomes and also in the pandora reference. Pandora was run only with Illumina data.

The Pandora output files are here: /hps/nobackup/iqbal/leandro/klebs_neonate_leah/pandora_compare_results

The de novo assemblies are here (also raw reads): /hps/nobackup2/iqbal/projects/pandora/klebs/neonate/data/KpST17_Norway_20190617/contigs/patient-pairs

LeahRoberts avatar Jan 31 '20 15:01 LeahRoberts

Also tagged for debugging.

leoisl avatar Feb 04 '20 13:02 leoisl