gatk
gatk copied to clipboard
GenotypeGvcfs mangles the input phase
From my review of the code, we have:
-
GenotypeGvcfs
first merges overlapping variant contexts, resulting in genotypes with no-calls. - Next, the samples are re-genotyped.
The problem is that phasing is lost in (1), where the alleles are converted to no-calls, and then the phase is never properly reconstructed in (2).
I tried diving into the code, and made it to GATKVariantContextUtils.makeGenotypeCall
to try to figure out how to insert phase information, but I went cross-eyed. @ldgauthier @davidbenjamin @droazen what do you think?
HaplotypeCaller
output:
chr7 92147580 . C G,<NON_REF> 1909.77 . BaseQRankSum=-0.304;ClippingRankSum=-2.049;DP=108;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=388800,108;REF_BASES=CTAGAACCAACAGACGAAAAG;ReadPosRankSum=1.577 GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS:SB 0|1:53,55,0:108:24,25,0:29,30,0:99:0|1:92147580_C_G:1938,0,2058,2097,2222,4319:92147580:24,29,25,30
chr7 92147581 . A <NON_REF> . . END=92147583 GT:DP:GQ:MIN_DP:PL 0/0:108:99:108:0,120,1800
chr7 92147584 . C T,<NON_REF> 1908.77 . BaseQRankSum=0.609;ClippingRankSum=1.815;DP=109;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=392400,109;REF_BASES=AACCAACAGACGAAAAGATCA;ReadPosRankSum=-1.456 GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS:SB 1|0:56,53,0:109:26,24,0:30,29,0:99:1|0:92147580_C_G:1937,0,2138,2106,2297,4403:92147580:26,30,24,29
This gives the genotype CAGAT/GAGAC
GenotypeGvcfs
output:
chr7 92147580 . C G 1930.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.040e-01;ClippingRankSum=-2.049e+00;DP=108;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=17.88;ReadPosRankSum=1.58;SOR=0.687 GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS 0|1:53,55:108:24,25,0:29,30,0:99:0|1:92147580_C_G:1938,0,2058:92147580
chr7 92147584 . C T 1929.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.609;ClippingRankSum=1.82;DP=109;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=17.70;ReadPosRankSum=-1.456e+00;SOR=0.738 GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS 0|1:56,53:109:26,24,0:30,29,0:99:1|0:92147580_C_G:1937,0,2138:92147580
This gives the genotype CAGAC/GAGAT
I would have guessed that GenotypeGVCFs is reproducing the phasing from the PGT FORMAT tag, but then they don't agree in the output. I trust HaplotypeCaller more. I'll take a look at GenotypeGVCFs.
@ldgauthier definitely HaplotypeCaller
is doing the right thing. I did walk through the GenotypeGvcfs code, and it doesn't look at the PGT FORMAT tag at all, it just leads that tag unmodified from the input.
Adding a little more fuel to the fire on this one. Our workaround for this for the time being was going to be to run HC twice - once to output a gVCF and again to output a called VCF directly. While being a little wasteful of compute we figured this would give us correctly phased calls in a genotyped VCF while also giving us the reference confidence information in a gVCF.
However, we just tried that and found that there is no phasing information in the genotyped VCF produced directly from HaplotypeCaller without any -ERC
parameter. Looking at the logs i see:
09:00:09.140 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
Based on this I don't think there's a workaround. I can either generate a gVCF with -ERC GVCF
or -ERC BP_RESOLUTION
, but then running GenotypeGVCFs
will mangle the phase information, or run without -ERC
but then get zero phasing information.
Am I missing a potential workaround? Also, I'm curious why phasing is disabled for non-ERC mode as, at least to me, it's not immediately obvious why that would be.
On a whim I took the latest code from master and commented out the two lines in HaplotypeCallerEngine:257-258 that disable phsyical phasing if emitReferenceConfidence()
is false, and tried running HC to generate a genotyped VCF with phase. At least on a simple test of a ~200bp locus with a pair of phased variants it appears to do the right thing and not cause any errors.
I know testing calling in one small locus isn't exactly comprehensive, and I'm trying now to call a larger set of regions and compare the calls generated to expected phase. Does anyone recall why this restriction was in place? I'm hoping that perhaps it was needed at the time, but isn't now and was just left in place because nobody needed it removed? I see the lines in question were last touched by @droazen in April 2016, but even that commit seems to be a large scale moving around of code rather than a commit that addressed this specific issue.
I'm going to open a PR to remove those lines - mostly so I can have the tests run up in CI, and see if anything breaks.
@tfenne If it's the commit I think it was, it was just me porting the GATK3 HaplotypeCaller to GATK4. Checking in GATK3, those lines were actually added by @eitanbanks in 2014:
commit add65e1d99209347c6be9a78d7839819ef4bcb9d
Author: Eric Banks <[email protected]>
Date: Fri Aug 15 17:47:32 2014 -0400
Updated the physical phasing in the Haplotype Caller to address requests from ATGU.
1. It is now turned on by default
2. It now phases homozygous variants
3. Most importantly, it also phases variants that are always on opposite haplotypes
Changed the INFO keys to be PID and PGT, as described in the header.
diff --git a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
index cf34b1fb4e..2ee5752f04 100644
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
@@ -497,10 +497,11 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
protected boolean mergeVariantsViaLD = false;
@Advanced
- @Argument(fullName="tryPhysicalPhasing", shortName="tryPhysicalPhasing", doc="If specified, we will add physical (read-based) phasing information", required = false)
- protected boolean tryPhysicalPhasing = false;
+ @Argument(fullName="doNotRunPhysicalPhasing", shortName="doNotRunPhysicalPhasing", doc="If specified, we will not try to add physical (read-based) phasing information", required = false)
+ protected boolean doNotRunPhysicalPhasing = false;
- public static final String HAPLOTYPE_CALLER_PHASING_KEY = "HCP";
+ public static final String HAPLOTYPE_CALLER_PHASING_ID_KEY = "PID";
+ public static final String HAPLOTYPE_CALLER_PHASING_GT_KEY = "PGT";
// -----------------------------------------------------------------------------------------------
// arguments for debugging / developing the haplotype caller
@@ -634,12 +635,11 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
if ( emitReferenceConfidence() ) {
if (SCAC.genotypingOutputMode == GenotypingOutputMode.GENOTYPE_GIVEN_ALLELES)
- throw new UserException.BadArgumentValue("ERC/gt_mode","you cannot request reference confidence output and Genotyping Giving Alleles at the same time");
+ throw new UserException.BadArgumentValue("ERC/gt_mode","you cannot request reference confidence output and GENOTYPE_GIVEN_ALLELES at the same time");
SCAC.genotypeArgs.STANDARD_CONFIDENCE_FOR_EMITTING = -0.0;
SCAC.genotypeArgs.STANDARD_CONFIDENCE_FOR_CALLING = -0.0;
-
// also, we don't need to output several of the annotations
annotationsToExclude.add("ChromosomeCounts");
annotationsToExclude.add("FisherStrand");
@@ -651,6 +651,9 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
if (!SCAC.annotateAllSitesWithPLs)
logger.info("All sites annotated with PLs forced to true for reference-model confidence output");
SCAC.annotateAllSitesWithPLs = true;
+ } else if ( ! doNotRunPhysicalPhasing ) {
+ doNotRunPhysicalPhasing = true;
+ logger.info("Disabling physical phasing, which is supported only for reference-model confidence output");
}
if ( SCAC.AFmodel == AFCalcFactory.Calculation.EXACT_GENERAL_PLOIDY )
@@ -678,7 +681,7 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
if( SCAC.genotypingOutputMode == GenotypingOutputMode.GENOTYPE_GIVEN_ALLELES && consensusMode )
throw new UserException("HaplotypeCaller cannot be run in both GENOTYPE_GIVEN_ALLELES mode and in consensus mode. Please choose one or the other.");
- genotypingEngine = new HaplotypeCallerGenotypingEngine( getToolkit(), SCAC, tryPhysicalPhasing);
+ genotypingEngine = new HaplotypeCallerGenotypingEngine( getToolkit(), SCAC, !doNotRunPhysicalPhasing);
// initialize the output VCF header
final VariantAnnotatorEngine annotationEngine = new VariantAnnotatorEngine(Arrays.asList(annotationClassesToUse), annotationsToUse, annotationsToExclude, this, getToolkit());
@@ -699,8 +702,10 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
VCFConstants.DEPTH_KEY,
VCFConstants.GENOTYPE_PL_KEY);
- if ( tryPhysicalPhasing )
- headerInfo.add(new VCFFormatHeaderLine(HAPLOTYPE_CALLER_PHASING_KEY, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "Physical phasing information, each unique ID within a given sample (but not across samples) connects alternate alleles as occurring on the same haplotype"));
+ if ( ! doNotRunPhysicalPhasing ) {
+ headerInfo.add(new VCFFormatHeaderLine(HAPLOTYPE_CALLER_PHASING_ID_KEY, 1, VCFHeaderLineType.String, "Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group"));
+ headerInfo.add(new VCFFormatHeaderLine(HAPLOTYPE_CALLER_PHASING_GT_KEY, 1, VCFHeaderLineType.String, "Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another"));
+ }
Thanks @droazen, I suspected that was the case from looking at the history. Though it's not clear to me from @eitanbanks' commit why he would disable it for non-ERC modes.
FWIW my PR looks to have only failed where the HC tests are comparing against existing files, and the existing files don't have phasing (whereas newly generated test files do). I'm going through and double-checking that that is the case, and will hopefully amend that PR shortly.
I'm confused. Nowhere in the commit above did I disable physical phasing for non-ERC modes. The only line I touched was to change the error message to use the proper argument (GENOTYPE_GIVEN_ALLELES vs. Genotyping Giving Alleles).
I have zero recollection of why physical phasing wouldn't be okay for standard HaplotypeCaller.
Oh, I see now. It was probably a bug back then? Or maybe it was all we tested with ATGU so didn't want to support it for other cases without more testing (and then I got distracted). Way too long ago...
@droazen @ldgauthier @eitanbanks @tfenne the issue with GenotypeGvcfs
outputting the wrong phase will still exist, even if @tfenne makes the change to HaplotypeCaller
(@tfenne: could you move further discussion of that fix to https://github.com/broadinstitute/gatk/pull/5772). While the above example is fairly simple to understand, since GenotypeGvcfs
run on a single sample will simply re-capitulate the genotype from the gVCF. In this case, could we just compare the PGT
field and the GT
field to ensure they agree, and if they don't, swap the alleles in the GT
field (for hets only)?
This brings up an other issue: what happens with multiple samples? If the genotype is changed for a sample, how is phasing information affected (in the PGT/PID fields too)? I couldn't find a place where the phasing information is updated when re-genotyping (PGT/PID
are fixed). I think this is beyond my ability to fix and determine (i.e. understanding how the GenotypingEngine
works), but unfortunately, this is still a bug. Presumably the PGT/PID
fields are correct, as they have been used in large call-sets (ex. Exac/Gnomad) without any bug reports, so is there a simple workaround like the above single-sample case?
GenotypeGvcfs
mangeling the input phase is still an issue for me in v4.4. Here is a small reducible example with two haplotypes, each with a different overlapping indel variant, that are are called and phased correctly by HaplotypeCaller as three variants (a one 1 bp deletion immediately followed by a 8 bp deletion for haplotype "earlyIndel" and a 5 bp deletion for haplotype "lateDeletion"), but the phasing of the third variant (GT field only) is swapped by GenotypeGvcfs
:
reproducible small example:
wget https://hgdownload.cse.ucsc.edu/goldenpath/mm10/chromosomes/chr19.fa.gz && gunzip chr19.fa.gz
samtools faidx chr19.fa
gatk CreateSequenceDictionary -R chr19.fa
bind 'set disable-completion on' # temporarily disable auto-completion to allow pasting tabs.
cat <<'EOF' > reads.sam
@HD VN:1.4 SO:coordinate
@SQ SN:chr19 LN:61431566
@RG ID:CGAAGAGGTAGGTGCGAG-1 SM:CGAAGAGGTAGGTGCGAG-1
@PG ID:bwa VN:0.7.12-r1039 CL:bwa mem -C -M -t 32 -H .sam.header .fa .fastq.gz .fastq.gz PN:bwa
A01382:376:H5HVGDRX3:1:2131:30807:22842 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTCTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C54T39 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:11 MQ:i:60 AS:i:135 XS:i:19
A01382:376:H5HVGDRX3:1:2252:12048:7200 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:10 MQ:i:60 AS:i:140 XS:i:20
A01382:376:H5HVGDRX3:1:2264:23258:10457 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFF,FFFFFFFFFFFFFF:FFFFFFF:FFFF:FFF:,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FF,FFFFFFFFFFF,FFF:F:FF:FFFFFF:FFFFF:F:,:FF,FFFF:FFFFFFF:FFFFFF:FFFF,FFFFFFFF:FFF,FFFF MC:Z:109M MD:Z:65^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:10 MQ:i:60 AS:i:140 XS:i:20
A01382:376:H5HVGDRX3:2:2105:19253:5024 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGCACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFF:F:FFF::FFFFFFFFFFFFFFF:FFF:FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C18T75 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:11 MQ:i:60 AS:i:135 XS:i:19
A01382:376:H5HVGDRX3:2:2105:19271:5431 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGCACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C18T75 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:11 MQ:i:60 AS:i:135 XS:i:19
A01382:376:H5HVGDRX3:2:2167:16938:3302 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGACATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTGGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FF:FFFFF,F:FFFFFFFFFFF::FF:FFFFFFF,F,,FFFFFFFFFFFFFFFF,FF:FFFF,FFFFF:FFFFFFFFFFF:FFF,FF:FFFFFFFFFF:FFFFFFFF:::FFFFFFFF:FFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF MC:Z:109M MD:Z:25T35A3^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:12 MQ:i:60 AS:i:130 XS:i:20
A01382:376:H5HVGDRX3:2:2231:9245:11741 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:10 MQ:i:60 AS:i:140 XS:i:20
A01382:376:H5HVGDRX3:2:2236:29125:20431 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFF,F,F:,FFFFFFF:FFFFFFF:F:FFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF,:,:FFFFFFFFF:FFFFFFFFFFFFFFF:FFFF::FFFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:10 MQ:i:60 AS:i:140 XS:i:20
A01382:376:H5HVGDRX3:2:2259:3025:9549 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF,FFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:10 MQ:i:60 AS:i:140 XS:i:20
A01382:376:H5HVGDRX3:2:2274:11460:6934 147 chr19 55910582 60 65M9D95M = 55910535 -216 TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF MC:Z:109M MD:Z:65^CAAATCCCC0C94 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:10 MQ:i:60 AS:i:140 XS:i:20
A01382:376:H5HVGDRX3:1:2247:22309:12164 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG :FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF MC:Z:109M MD:Z:68^CCCAT92 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:5 MQ:i:60 AS:i:149 XS:i:20
A01382:376:H5HVGDRX3:2:2137:18249:27336 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG :FFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF:F:,FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,F:FFFFFFFF MC:Z:109M MD:Z:68^CCCAT92 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:5 MQ:i:60 AS:i:149 XS:i:20
A01382:376:H5HVGDRX3:2:2206:5638:6026 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:,,F:FF:F,FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:109M MD:Z:68^CCCAT92 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:5 MQ:i:60 AS:i:149 XS:i:20
A01382:376:H5HVGDRX3:2:2227:24740:17628 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF::F,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFFFFFF MC:Z:109M MD:Z:68^CCCAT92 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:5 MQ:i:60 AS:i:149 XS:i:20
A01382:376:H5HVGDRX3:2:2252:15700:25144 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF MC:Z:109M MD:Z:68^CCCAT92 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:5 MQ:i:60 AS:i:149 XS:i:20
A01382:376:H5HVGDRX3:2:2255:32018:34898 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAACCCCCGCTAGGATGGTTGGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG ,:FFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFF:F,:FFFF:FF MC:Z:109M MD:Z:65T2^CCCAT14A77 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:7 MQ:i:60 AS:i:139 XS:i:19
A01382:376:H5HVGDRX3:2:2270:19180:5181 147 chr19 55910586 60 68M5D92M = 55910535 -216 AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTCTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG FF,FFFFF:FFFFFFFFFFFF,FF:FFFFFFF:F,FFFFFFFFFFFFFFF:FFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFF:FFFFFFFFFFFFFFFF:FF,F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MC:Z:109M MD:Z:68^CCCAT56T35 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:6 MQ:i:60 AS:i:144 XS:i:19
A01382:376:H5HVGDRX3:1:2145:13051:6527 147 chr19 55910620 60 34S34M5D92M = 55910535 -216 CCCGCCCCCCCACCCACCAGATCTCTCGCCACACTACCCGCCGTCGCCCGGCACCGTAGGACAACTCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG :,,,FF,FF,F,FFF,FFF,:,,,F,F,FF,,,,,FFFFFF,F:F:FFFFFF:FFF:FF:FF:F,FFFFF,FFFF,:FF:F:FF,FFFF:FF,F:F:FF:FFFFFFFF:F,::FFF::FF:FFF,FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFF MC:Z:109M MD:Z:7T22A3^CCCAT92 RG:Z:CGAAGAGGTAGGTGCGAG-1 NM:i:7 MQ:i:60 AS:i:105 XS:i:20
EOF
bind 'set disable-completion off'
samtools view reads.sam -b > reads.bam
samtools index reads.bam
gatk HaplotypeCaller -R chr19.fa -I reads.bam -O output.g.vcf -ERC GVCF
bcftools view output.g.vcf -c1 | bcftools annotate -x INFO,FORMAT/SB,FORMAT/PL | tail
gatk GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf
bcftools view output.vcf -c1 | bcftools annotate -x INFO,FORMAT/SB,FORMAT/PL | tail
output:
Using GATK jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar HaplotypeCaller -R chr19.fa -I reads.bam -O output.g.vcf -ERC GVCF Picked up JAVA_TOOL_OPTIONS: -Djava.net.useSystemProxies=true 13:39:56.569 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 13:39:56.647 INFO HaplotypeCaller - ------------------------------------------------------------ 13:39:56.660 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.4.0.0 13:39:56.666 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/ 13:39:56.666 INFO HaplotypeCaller - Executing as gleixner@odcf-worker02 on Linux v3.10.0-1160.76.1.el7.x86_64 amd64 13:39:56.666 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v17+35-2724 13:39:56.667 INFO HaplotypeCaller - Start Date/Time: October 26, 2023 at 1:39:56 PM CEST 13:39:56.667 INFO HaplotypeCaller - ------------------------------------------------------------ 13:39:56.667 INFO HaplotypeCaller - ------------------------------------------------------------ 13:39:56.669 INFO HaplotypeCaller - HTSJDK Version: 3.0.5 13:39:56.669 INFO HaplotypeCaller - Picard Version: 3.0.0 13:39:56.669 INFO HaplotypeCaller - Built for Spark Version: 3.3.1 13:39:56.670 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2 13:39:56.671 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 13:39:56.671 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 13:39:56.672 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 13:39:56.672 INFO HaplotypeCaller - Deflater: IntelDeflater 13:39:56.673 INFO HaplotypeCaller - Inflater: IntelInflater 13:39:56.674 INFO HaplotypeCaller - GCS max retries/reopens: 20 13:39:56.679 INFO HaplotypeCaller - Requester pays: disabled 13:39:56.680 INFO HaplotypeCaller - Initializing engine 13:39:56.968 INFO HaplotypeCaller - Done initializing engine 13:39:56.971 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled 13:39:57.000 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output 13:39:57.000 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output 13:39:57.020 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 13:39:57.026 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported 13:39:57.026 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation! 13:39:57.108 INFO ProgressMeter - Starting traversal 13:39:57.110 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute 13:40:07.119 INFO ProgressMeter - chr19:8969701 0.2 29900 179382.1 13:40:17.116 INFO ProgressMeter - chr19:20264701 0.3 67550 202609.5 13:40:27.115 INFO ProgressMeter - chr19:31874701 0.5 106250 212471.7 13:40:37.116 INFO ProgressMeter - chr19:44792701 0.7 149310 223937.0 13:40:49.251 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr19:55910646 and possibly subsequent; at least 10 samples must have called genotypes 13:40:49.413 INFO ProgressMeter - chr19:55910600 0.9 186370 213817.0
13:40:55.466 INFO HaplotypeCaller - 0 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
0 total reads filtered out of 18 reads processed
13:40:55.466 INFO ProgressMeter - chr19:61430410 1.0 204773 210541.8
13:40:55.467 INFO ProgressMeter - Traversal complete. Processed 204773 total regions in 1.0 minutes.
13:40:55.707 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.096771218
13:40:55.708 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.04 sec
13:40:55.709 INFO HaplotypeCaller - Shutting down engine
[October 26, 2023 at 1:40:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.99 minutes.
Runtime.totalMemory()=2617245696
##source=HaplotypeCaller
##bcftools_viewVersion=1.16+htslib-1.16
##bcftools_viewCommand=view -c1 output.g.vcf; Date=Thu Oct 26 13:40:56 2023
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO,FORMAT/SB,FORMAT/PL; Date=Thu Oct 26 13:40:56 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CGAAGAGGTAGGTGCGAG-1
chr19 55910646 . AC A,<NON_REF> 352.6 . . GT:AD:DP:GQ:PGT:PID:PS 0|1:6,9,0:15:99:0|1:55910646_AC_A:55910646
chr19 55910648 . AAATCCCCC A,<NON_REF> 352.6 . . GT:AD:DP:GQ:PGT:PID:PS 0|1:6,9,0:15:99:0|1:55910646_AC_A:55910646
chr19 55910653 . CCCCAT *,C,<NON_REF> 227.84 . . GT:AD:DP:GQ:PGT:PID:PS 2|1:0,9,6,0:15:99:1|0:55910646_AC_A:55910646
chr19 55910675 . T C,<NON_REF> 30.64 . . GT:AD:DP:GQ 0/1:13,2,0:15:38
Using GATK jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf
Picked up JAVA_TOOL_OPTIONS: -Djava.net.useSystemProxies=true
13:40:59.573 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:40:59.636 INFO GenotypeGVCFs - ------------------------------------------------------------
13:40:59.653 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.4.0.0
13:40:59.653 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
13:40:59.654 INFO GenotypeGVCFs - Executing as gleixner@odcf-worker02 on Linux v3.10.0-1160.76.1.el7.x86_64 amd64
13:40:59.654 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17+35-2724
13:40:59.655 INFO GenotypeGVCFs - Start Date/Time: October 26, 2023 at 1:40:59 PM CEST
13:40:59.655 INFO GenotypeGVCFs - ------------------------------------------------------------
13:40:59.655 INFO GenotypeGVCFs - ------------------------------------------------------------
13:40:59.658 INFO GenotypeGVCFs - HTSJDK Version: 3.0.5
13:40:59.658 INFO GenotypeGVCFs - Picard Version: 3.0.0
13:40:59.659 INFO GenotypeGVCFs - Built for Spark Version: 3.3.1
13:40:59.659 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:40:59.660 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:40:59.660 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:40:59.661 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:40:59.661 INFO GenotypeGVCFs - Deflater: IntelDeflater
13:40:59.661 INFO GenotypeGVCFs - Inflater: IntelInflater
13:40:59.661 INFO GenotypeGVCFs - GCS max retries/reopens: 20
13:40:59.662 INFO GenotypeGVCFs - Requester pays: disabled
13:40:59.663 INFO GenotypeGVCFs - Initializing engine
13:40:59.909 INFO FeatureManager - Using codec VCFCodec to read file file:///omics/groups/OE0540/internal/users/gleixner/cropseq_uli/rep_ex/test3/output.g.vcf
13:40:59.951 INFO GenotypeGVCFs - Done initializing engine
13:41:00.006 INFO ProgressMeter - Starting traversal
13:41:00.007 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
13:41:00.406 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr19:55910646 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
13:41:00.476 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr19:55910646 and possibly subsequent; at least 10 samples must have called genotypes
13:41:00.528 INFO ProgressMeter - unmapped 0.0 37 4277.5
13:41:00.528 INFO ProgressMeter - Traversal complete. Processed 37 total variants in 0.0 minutes.
13:41:00.580 INFO GenotypeGVCFs - Shutting down engine
[October 26, 2023 at 1:41:00 PM CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=285212672
##source=HaplotypeCaller
##bcftools_viewVersion=1.16+htslib-1.16
##bcftools_viewCommand=view -c1 output.vcf; Date=Thu Oct 26 13:41:08 2023
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO,FORMAT/SB,FORMAT/PL; Date=Thu Oct 26 13:41:08 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CGAAGAGGTAGGTGCGAG-1
chr19 55910646 . AC A 352.6 . . GT:AD:DP:GQ:PGT:PID:PS 0|1:6,9:15:99:0|1:55910646_AC_A:55910646
chr19 55910648 . AAATCCCCC A 352.6 . . GT:AD:DP:GQ:PGT:PID:PS 0|1:6,9:15:99:0|1:55910646_AC_A:55910646
chr19 55910653 . CCCCAT *,C 227.84 . . GT:AD:DP:GQ:PGT:PID:PS 1|2:0,9,6:15:99:1|0:55910646_AC_A:55910646
chr19 55910675 . T C 30.64 . . GT:AD:DP:GQ 0/1:13,2:15:38
The relevant parts of the output
HaplotypeCaller:
chr19 55910648 AAATCCCCC A,<NON_REF> GT:AD:DP:GQ:PGT:PID:PS 0|1:6,9,0 :15:99:0|1:55910646_AC_A:55910646
chr19 55910653 CCCCAT *,C,<NON_REF> GT:AD:DP:GQ:PGT:PID:PS 2|1:0,9,6,0:15:99:1|0:55910646_AC_A:55910646
GenotypeGVCFs:
chr19 55910648 AAATCCCCC A GT:AD:DP:GQ:PGT:PID:PS 0|1:6,9 :15:99:0|1:55910646_AC_A:55910646
chr19 55910653 CCCCAT *,C GT:AD:DP:GQ:PGT:PID:PS 1|2:0,9,6 :15:99:1|0:55910646_AC_A:55910646
I understand that
- PID/PGT can only encode phasing for biallelic variants,
- so PID/PGT + unphased GT can not be used to infer the correctly phased GT in general
- but only (and only maybe) in this case PID/PGT + wrongly phased GT can be used to infer the correctly phased GT (because the mangeling of the GT might depend on PGT)
is this correct?
Permalinks for the links in the OP: https://github.com/broadinstitute/gatk/blob/e5a6fed18ed7df112152461699de0f0d01c8b9c0/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs.java#L271 https://github.com/broadinstitute/gatk/blob/e5a6fed18ed7df112152461699de0f0d01c8b9c0/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs.java#L272-L273
This has since been moved to https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs.java#L316 bzw. https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFsEngine.java#L136
However, I think the merged variant contexts are still called and phased correctly, but the phasing gets lost in the regenotyping step, because final_alleles
is unphased :
https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVariantContextUtils.java#L337-L337