gatk icon indicating copy to clipboard operation
gatk copied to clipboard

GenotypeGvcfs mangles the input phase

Open nh13 opened this issue 6 years ago • 11 comments

From my review of the code, we have:

  1. GenotypeGvcfs first merges overlapping variant contexts, resulting in genotypes with no-calls.
  2. Next, the samples are re-genotyped.

The problem is that phasing is lost in (1), where the alleles are converted to no-calls, and then the phase is never properly reconstructed in (2).

I tried diving into the code, and made it to GATKVariantContextUtils.makeGenotypeCall to try to figure out how to insert phase information, but I went cross-eyed. @ldgauthier @davidbenjamin @droazen what do you think?

HaplotypeCaller output:

chr7	92147580	.	C	G,<NON_REF>	1909.77	.	BaseQRankSum=-0.304;ClippingRankSum=-2.049;DP=108;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=388800,108;REF_BASES=CTAGAACCAACAGACGAAAAG;ReadPosRankSum=1.577	GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS:SB	0|1:53,55,0:108:24,25,0:29,30,0:99:0|1:92147580_C_G:1938,0,2058,2097,2222,4319:92147580:24,29,25,30
chr7	92147581	.	A	<NON_REF>	.	.	END=92147583	GT:DP:GQ:MIN_DP:PL	0/0:108:99:108:0,120,1800
chr7	92147584	.	C	T,<NON_REF>	1908.77	.	BaseQRankSum=0.609;ClippingRankSum=1.815;DP=109;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=392400,109;REF_BASES=AACCAACAGACGAAAAGATCA;ReadPosRankSum=-1.456	GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS:SB	1|0:56,53,0:109:26,24,0:30,29,0:99:1|0:92147580_C_G:1937,0,2138,2106,2297,4403:92147580:26,30,24,29

This gives the genotype CAGAT/GAGAC

GenotypeGvcfs output:

chr7	92147580	.	C	G	1930.60	.	AC=1;AF=0.500;AN=2;BaseQRankSum=-3.040e-01;ClippingRankSum=-2.049e+00;DP=108;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=17.88;ReadPosRankSum=1.58;SOR=0.687	GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS	0|1:53,55:108:24,25,0:29,30,0:99:0|1:92147580_C_G:1938,0,2058:92147580
chr7	92147584	.	C	T	1929.60	.	AC=1;AF=0.500;AN=2;BaseQRankSum=0.609;ClippingRankSum=1.82;DP=109;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=17.70;ReadPosRankSum=-1.456e+00;SOR=0.738	GT:AD:DP:F1R2:F2R1:GQ:PGT:PID:PL:PS	0|1:56,53:109:26,24,0:30,29,0:99:1|0:92147580_C_G:1937,0,2138:92147580

This gives the genotype CAGAC/GAGAT

nh13 avatar Feb 26 '19 22:02 nh13

I would have guessed that GenotypeGVCFs is reproducing the phasing from the PGT FORMAT tag, but then they don't agree in the output. I trust HaplotypeCaller more. I'll take a look at GenotypeGVCFs.

ldgauthier avatar Feb 27 '19 19:02 ldgauthier

@ldgauthier definitely HaplotypeCaller is doing the right thing. I did walk through the GenotypeGvcfs code, and it doesn't look at the PGT FORMAT tag at all, it just leads that tag unmodified from the input.

nh13 avatar Feb 27 '19 21:02 nh13

Adding a little more fuel to the fire on this one. Our workaround for this for the time being was going to be to run HC twice - once to output a gVCF and again to output a called VCF directly. While being a little wasteful of compute we figured this would give us correctly phased calls in a genotyped VCF while also giving us the reference confidence information in a gVCF.

However, we just tried that and found that there is no phasing information in the genotyped VCF produced directly from HaplotypeCaller without any -ERC parameter. Looking at the logs i see:

09:00:09.140 INFO  HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output

Based on this I don't think there's a workaround. I can either generate a gVCF with -ERC GVCF or -ERC BP_RESOLUTION, but then running GenotypeGVCFs will mangle the phase information, or run without -ERC but then get zero phasing information.

Am I missing a potential workaround? Also, I'm curious why phasing is disabled for non-ERC mode as, at least to me, it's not immediately obvious why that would be.

tfenne avatar Mar 07 '19 17:03 tfenne

On a whim I took the latest code from master and commented out the two lines in HaplotypeCallerEngine:257-258 that disable phsyical phasing if emitReferenceConfidence() is false, and tried running HC to generate a genotyped VCF with phase. At least on a simple test of a ~200bp locus with a pair of phased variants it appears to do the right thing and not cause any errors.

I know testing calling in one small locus isn't exactly comprehensive, and I'm trying now to call a larger set of regions and compare the calls generated to expected phase. Does anyone recall why this restriction was in place? I'm hoping that perhaps it was needed at the time, but isn't now and was just left in place because nobody needed it removed? I see the lines in question were last touched by @droazen in April 2016, but even that commit seems to be a large scale moving around of code rather than a commit that addressed this specific issue.

I'm going to open a PR to remove those lines - mostly so I can have the tests run up in CI, and see if anything breaks.

tfenne avatar Mar 07 '19 17:03 tfenne

@tfenne If it's the commit I think it was, it was just me porting the GATK3 HaplotypeCaller to GATK4. Checking in GATK3, those lines were actually added by @eitanbanks in 2014:

commit add65e1d99209347c6be9a78d7839819ef4bcb9d
Author: Eric Banks <[email protected]>
Date:   Fri Aug 15 17:47:32 2014 -0400

    Updated the physical phasing in the Haplotype Caller to address requests from ATGU.
    
    1. It is now turned on by default
    2. It now phases homozygous variants
    3. Most importantly, it also phases variants that are always on opposite haplotypes
    
    Changed the INFO keys to be PID and PGT, as described in the header.

diff --git a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
index cf34b1fb4e..2ee5752f04 100644
--- a/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
+++ b/protected/gatk-tools-protected/src/main/java/org/broadinstitute/gatk/tools/walkers/haplotypecaller/HaplotypeCaller.java
@@ -497,10 +497,11 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
     protected boolean mergeVariantsViaLD = false;
 
     @Advanced
-    @Argument(fullName="tryPhysicalPhasing", shortName="tryPhysicalPhasing", doc="If specified, we will add physical (read-based) phasing information", required = false)
-    protected boolean tryPhysicalPhasing = false;
+    @Argument(fullName="doNotRunPhysicalPhasing", shortName="doNotRunPhysicalPhasing", doc="If specified, we will not try to add physical (read-based) phasing information", required = false)
+    protected boolean doNotRunPhysicalPhasing = false;
 
-    public static final String HAPLOTYPE_CALLER_PHASING_KEY = "HCP";
+    public static final String HAPLOTYPE_CALLER_PHASING_ID_KEY = "PID";
+    public static final String HAPLOTYPE_CALLER_PHASING_GT_KEY = "PGT";
 
     // -----------------------------------------------------------------------------------------------
     // arguments for debugging / developing the haplotype caller
@@ -634,12 +635,11 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
         if ( emitReferenceConfidence() ) {
 
             if (SCAC.genotypingOutputMode == GenotypingOutputMode.GENOTYPE_GIVEN_ALLELES)
-                throw new UserException.BadArgumentValue("ERC/gt_mode","you cannot request reference confidence output and Genotyping Giving Alleles at the same time");
+                throw new UserException.BadArgumentValue("ERC/gt_mode","you cannot request reference confidence output and GENOTYPE_GIVEN_ALLELES at the same time");
 
             SCAC.genotypeArgs.STANDARD_CONFIDENCE_FOR_EMITTING = -0.0;
             SCAC.genotypeArgs.STANDARD_CONFIDENCE_FOR_CALLING = -0.0;
 
-
             // also, we don't need to output several of the annotations
             annotationsToExclude.add("ChromosomeCounts");
             annotationsToExclude.add("FisherStrand");
@@ -651,6 +651,9 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
             if (!SCAC.annotateAllSitesWithPLs)
                 logger.info("All sites annotated with PLs forced to true for reference-model confidence output");
             SCAC.annotateAllSitesWithPLs = true;
+        } else if ( ! doNotRunPhysicalPhasing ) {
+            doNotRunPhysicalPhasing = true;
+            logger.info("Disabling physical phasing, which is supported only for reference-model confidence output");
         }
 
         if ( SCAC.AFmodel == AFCalcFactory.Calculation.EXACT_GENERAL_PLOIDY )
@@ -678,7 +681,7 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
         if( SCAC.genotypingOutputMode == GenotypingOutputMode.GENOTYPE_GIVEN_ALLELES && consensusMode )
             throw new UserException("HaplotypeCaller cannot be run in both GENOTYPE_GIVEN_ALLELES mode and in consensus mode. Please choose one or the other.");
 
-        genotypingEngine = new HaplotypeCallerGenotypingEngine( getToolkit(), SCAC, tryPhysicalPhasing);
+        genotypingEngine = new HaplotypeCallerGenotypingEngine( getToolkit(), SCAC, !doNotRunPhysicalPhasing);
         // initialize the output VCF header
         final VariantAnnotatorEngine annotationEngine = new VariantAnnotatorEngine(Arrays.asList(annotationClassesToUse), annotationsToUse, annotationsToExclude, this, getToolkit());
 
@@ -699,8 +702,10 @@ public class HaplotypeCaller extends ActiveRegionWalker<List<VariantContext>, In
                 VCFConstants.DEPTH_KEY,
                 VCFConstants.GENOTYPE_PL_KEY);
 
-        if ( tryPhysicalPhasing )
-            headerInfo.add(new VCFFormatHeaderLine(HAPLOTYPE_CALLER_PHASING_KEY, VCFHeaderLineCount.UNBOUNDED, VCFHeaderLineType.String, "Physical phasing information, each unique ID within a given sample (but not across samples) connects alternate alleles as occurring on the same haplotype"));
+        if ( ! doNotRunPhysicalPhasing ) {
+            headerInfo.add(new VCFFormatHeaderLine(HAPLOTYPE_CALLER_PHASING_ID_KEY, 1, VCFHeaderLineType.String, "Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group"));
+            headerInfo.add(new VCFFormatHeaderLine(HAPLOTYPE_CALLER_PHASING_GT_KEY, 1, VCFHeaderLineType.String, "Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another"));
+        }

droazen avatar Mar 07 '19 20:03 droazen

Thanks @droazen, I suspected that was the case from looking at the history. Though it's not clear to me from @eitanbanks' commit why he would disable it for non-ERC modes.

FWIW my PR looks to have only failed where the HC tests are comparing against existing files, and the existing files don't have phasing (whereas newly generated test files do). I'm going through and double-checking that that is the case, and will hopefully amend that PR shortly.

tfenne avatar Mar 07 '19 20:03 tfenne

I'm confused. Nowhere in the commit above did I disable physical phasing for non-ERC modes. The only line I touched was to change the error message to use the proper argument (GENOTYPE_GIVEN_ALLELES vs. Genotyping Giving Alleles).

I have zero recollection of why physical phasing wouldn't be okay for standard HaplotypeCaller.

eitanbanks avatar Mar 08 '19 14:03 eitanbanks

Oh, I see now. It was probably a bug back then? Or maybe it was all we tested with ATGU so didn't want to support it for other cases without more testing (and then I got distracted). Way too long ago...

eitanbanks avatar Mar 08 '19 14:03 eitanbanks

@droazen @ldgauthier @eitanbanks @tfenne the issue with GenotypeGvcfs outputting the wrong phase will still exist, even if @tfenne makes the change to HaplotypeCaller (@tfenne: could you move further discussion of that fix to https://github.com/broadinstitute/gatk/pull/5772). While the above example is fairly simple to understand, since GenotypeGvcfs run on a single sample will simply re-capitulate the genotype from the gVCF. In this case, could we just compare the PGT field and the GT field to ensure they agree, and if they don't, swap the alleles in the GT field (for hets only)?

This brings up an other issue: what happens with multiple samples? If the genotype is changed for a sample, how is phasing information affected (in the PGT/PID fields too)? I couldn't find a place where the phasing information is updated when re-genotyping (PGT/PID are fixed). I think this is beyond my ability to fix and determine (i.e. understanding how the GenotypingEngine works), but unfortunately, this is still a bug. Presumably the PGT/PID fields are correct, as they have been used in large call-sets (ex. Exac/Gnomad) without any bug reports, so is there a simple workaround like the above single-sample case?

nh13 avatar Mar 08 '19 18:03 nh13

GenotypeGvcfs mangeling the input phase is still an issue for me in v4.4. Here is a small reducible example with two haplotypes, each with a different overlapping indel variant, that are are called and phased correctly by HaplotypeCaller as three variants (a one 1 bp deletion immediately followed by a 8 bp deletion for haplotype "earlyIndel" and a 5 bp deletion for haplotype "lateDeletion"), but the phasing of the third variant (GT field only) is swapped by GenotypeGvcfs :

image

reproducible small example:

wget https://hgdownload.cse.ucsc.edu/goldenpath/mm10/chromosomes/chr19.fa.gz && gunzip chr19.fa.gz
samtools faidx chr19.fa
gatk CreateSequenceDictionary -R chr19.fa 

bind 'set disable-completion on' # temporarily disable auto-completion to allow pasting tabs.
cat <<'EOF' > reads.sam
@HD	VN:1.4	SO:coordinate
@SQ	SN:chr19	LN:61431566
@RG	ID:CGAAGAGGTAGGTGCGAG-1	SM:CGAAGAGGTAGGTGCGAG-1
@PG	ID:bwa	VN:0.7.12-r1039	CL:bwa mem -C -M -t 32 -H .sam.header .fa .fastq.gz .fastq.gz	PN:bwa
A01382:376:H5HVGDRX3:1:2131:30807:22842	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTCTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C54T39	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:11	MQ:i:60	AS:i:135	XS:i:19
A01382:376:H5HVGDRX3:1:2252:12048:7200	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:1:2264:23258:10457	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF,FFFFFFFFFFFFFF:FFFFFFF:FFFF:FFF:,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FF,FFFFFFFFFFF,FFF:F:FF:FFFFFF:FFFFF:F:,:FF,FFFF:FFFFFFF:FFFFFF:FFFF,FFFFFFFF:FFF,FFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2105:19253:5024	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGCACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF:F:FFF::FFFFFFFFFFFFFFF:FFF:FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C18T75	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:11	MQ:i:60	AS:i:135	XS:i:19
A01382:376:H5HVGDRX3:2:2105:19271:5431	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGCACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C18T75	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:11	MQ:i:60	AS:i:135	XS:i:19
A01382:376:H5HVGDRX3:2:2167:16938:3302	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGACATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTGGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FF:FFFFF,F:FFFFFFFFFFF::FF:FFFFFFF,F,,FFFFFFFFFFFFFFFF,FF:FFFF,FFFFF:FFFFFFFFFFF:FFF,FF:FFFFFFFFFF:FFFFFFFF:::FFFFFFFF:FFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:25T35A3^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:12	MQ:i:60	AS:i:130	XS:i:20
A01382:376:H5HVGDRX3:2:2231:9245:11741	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2236:29125:20431	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFF,F,F:,FFFFFFF:FFFFFFF:F:FFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF,:,:FFFFFFFFF:FFFFFFFFFFFFFFF:FFFF::FFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2259:3025:9549	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF,FFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2274:11460:6934	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:1:2247:22309:12164	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2137:18249:27336	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	:FFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF:F:,FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,F:FFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2206:5638:6026	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:,,F:FF:F,FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2227:24740:17628	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF::F,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2252:15700:25144	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2255:32018:34898	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAACCCCCGCTAGGATGGTTGGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	,:FFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFF:F,:FFFF:FF	MC:Z:109M	MD:Z:65T2^CCCAT14A77	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:7	MQ:i:60	AS:i:139	XS:i:19
A01382:376:H5HVGDRX3:2:2270:19180:5181	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTCTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FF,FFFFF:FFFFFFFFFFFF,FF:FFFFFFF:F,FFFFFFFFFFFFFFF:FFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFF:FFFFFFFFFFFFFFFF:FF,F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT56T35	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:6	MQ:i:60	AS:i:144	XS:i:19
A01382:376:H5HVGDRX3:1:2145:13051:6527	147	chr19	55910620	60	34S34M5D92M	=	55910535	-216	CCCGCCCCCCCACCCACCAGATCTCTCGCCACACTACCCGCCGTCGCCCGGCACCGTAGGACAACTCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	:,,,FF,FF,F,FFF,FFF,:,,,F,F,FF,,,,,FFFFFF,F:F:FFFFFF:FFF:FF:FF:F,FFFFF,FFFF,:FF:F:FF,FFFF:FF,F:F:FF:FFFFFFFF:F,::FFF::FF:FFF,FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFF	MC:Z:109M	MD:Z:7T22A3^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:7	MQ:i:60	AS:i:105	XS:i:20
EOF

bind 'set disable-completion off'

samtools view reads.sam -b > reads.bam
samtools index reads.bam

gatk HaplotypeCaller -R chr19.fa -I reads.bam -O output.g.vcf -ERC GVCF       
bcftools view output.g.vcf -c1 | bcftools annotate -x INFO,FORMAT/SB,FORMAT/PL | tail  
gatk GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf  
bcftools view output.vcf -c1 | bcftools annotate -x INFO,FORMAT/SB,FORMAT/PL | tail

output:

Using GATK jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar                                                                                                                                                                                                                       Running:                                                                                                                                                                                                                                                                                                                        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar HaplotypeCaller -R chr19.fa -I reads.bam -O output.g.vcf -ERC GVCF                                                                                                                                                                                                                                                                                                                      Picked up JAVA_TOOL_OPTIONS:  -Djava.net.useSystemProxies=true                                                                                                                                                                                                                                                              13:39:56.569 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so                                                                                                     13:39:56.647 INFO  HaplotypeCaller - ------------------------------------------------------------                                                                                                                                                                                                                           13:39:56.660 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.4.0.0                                                                                                                                                                                                                                            13:39:56.666 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/                                                                                                                                                                                                          13:39:56.666 INFO  HaplotypeCaller - Executing as gleixner@odcf-worker02 on Linux v3.10.0-1160.76.1.el7.x86_64 amd64                                                                                                                                                                                                        13:39:56.666 INFO  HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v17+35-2724                                                                                                                                                                                                                                     13:39:56.667 INFO  HaplotypeCaller - Start Date/Time: October 26, 2023 at 1:39:56 PM CEST                                                                                                                                                                                                                                   13:39:56.667 INFO  HaplotypeCaller - ------------------------------------------------------------                                                                                                                                                                                                                           13:39:56.667 INFO  HaplotypeCaller - ------------------------------------------------------------                                                                                                                                                                                                                           13:39:56.669 INFO  HaplotypeCaller - HTSJDK Version: 3.0.5                                                                                                                                                                                                                                                                  13:39:56.669 INFO  HaplotypeCaller - Picard Version: 3.0.0                                                                                                                                                                                                                                                                  13:39:56.669 INFO  HaplotypeCaller - Built for Spark Version: 3.3.1                                                                                                                                                                                                                                                         13:39:56.670 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2                                                                                                                                                                                                                                                  13:39:56.671 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false                                                                                                                                                                                                                                 13:39:56.671 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true                                                                                                                                                                                                                                 13:39:56.672 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false                                                                                                                                                                                                                                 13:39:56.672 INFO  HaplotypeCaller - Deflater: IntelDeflater                                                                                                                                                                                                                                                                13:39:56.673 INFO  HaplotypeCaller - Inflater: IntelInflater                                                                                                                                                                                                                                                                13:39:56.674 INFO  HaplotypeCaller - GCS max retries/reopens: 20                                                                                                                                                                                                                                                            13:39:56.679 INFO  HaplotypeCaller - Requester pays: disabled                                                                                                                                                                                                                                                               13:39:56.680 INFO  HaplotypeCaller - Initializing engine                                                                                                                                                                                                                                                                    13:39:56.968 INFO  HaplotypeCaller - Done initializing engine                                                                                                                                                                                                                                                               13:39:56.971 INFO  HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled           13:39:57.000 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to -0.0 for reference-model confidence output                                                                                                                                                                                       13:39:57.000 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output                                                                                                                                                                                                13:39:57.020 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so                                                                                                                 13:39:57.026 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported                                                                                                                                                                                                           13:39:57.026 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!                                                                                                                      13:39:57.108 INFO  ProgressMeter - Starting traversal                                                                                                                                                                                                                                                                       13:39:57.110 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute                                                                                                                                                                                                             13:40:07.119 INFO  ProgressMeter -        chr19:8969701              0.2                 29900         179382.1                                                                                                                                                                                                             13:40:17.116 INFO  ProgressMeter -       chr19:20264701              0.3                 67550         202609.5                                                                                                                                                                                                             13:40:27.115 INFO  ProgressMeter -       chr19:31874701              0.5                106250         212471.7                                                                                                                                                                                                             13:40:37.116 INFO  ProgressMeter -       chr19:44792701              0.7                149310         223937.0                                                                                                                                                                                                             13:40:49.251 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position chr19:55910646 and possibly subsequent; at least 10 samples must have called genotypes                                                                                                                                              13:40:49.413 INFO  ProgressMeter -       chr19:55910600              0.9                186370         213817.0
13:40:55.466 INFO  HaplotypeCaller - 0 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
0 total reads filtered out of 18 reads processed
13:40:55.466 INFO  ProgressMeter -       chr19:61430410              1.0                204773         210541.8
13:40:55.467 INFO  ProgressMeter - Traversal complete. Processed 204773 total regions in 1.0 minutes.
13:40:55.707 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.096771218
13:40:55.708 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.04 sec
13:40:55.709 INFO  HaplotypeCaller - Shutting down engine
[October 26, 2023 at 1:40:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.99 minutes.
Runtime.totalMemory()=2617245696
##source=HaplotypeCaller
##bcftools_viewVersion=1.16+htslib-1.16
##bcftools_viewCommand=view -c1 output.g.vcf; Date=Thu Oct 26 13:40:56 2023
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO,FORMAT/SB,FORMAT/PL; Date=Thu Oct 26 13:40:56 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CGAAGAGGTAGGTGCGAG-1
chr19   55910646        .       AC      A,<NON_REF>     352.6   .       .       GT:AD:DP:GQ:PGT:PID:PS  0|1:6,9,0:15:99:0|1:55910646_AC_A:55910646
chr19   55910648        .       AAATCCCCC       A,<NON_REF>     352.6   .       .       GT:AD:DP:GQ:PGT:PID:PS  0|1:6,9,0:15:99:0|1:55910646_AC_A:55910646
chr19   55910653        .       CCCCAT  *,C,<NON_REF>   227.84  .       .       GT:AD:DP:GQ:PGT:PID:PS  2|1:0,9,6,0:15:99:1|0:55910646_AC_A:55910646
chr19   55910675        .       T       C,<NON_REF>     30.64   .       .       GT:AD:DP:GQ     0/1:13,2,0:15:38
Using GATK jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf
Picked up JAVA_TOOL_OPTIONS:  -Djava.net.useSystemProxies=true
13:40:59.573 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:40:59.636 INFO  GenotypeGVCFs - ------------------------------------------------------------
13:40:59.653 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.4.0.0
13:40:59.653 INFO  GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
13:40:59.654 INFO  GenotypeGVCFs - Executing as gleixner@odcf-worker02 on Linux v3.10.0-1160.76.1.el7.x86_64 amd64
13:40:59.654 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17+35-2724
13:40:59.655 INFO  GenotypeGVCFs - Start Date/Time: October 26, 2023 at 1:40:59 PM CEST
13:40:59.655 INFO  GenotypeGVCFs - ------------------------------------------------------------
13:40:59.655 INFO  GenotypeGVCFs - ------------------------------------------------------------
13:40:59.658 INFO  GenotypeGVCFs - HTSJDK Version: 3.0.5
13:40:59.658 INFO  GenotypeGVCFs - Picard Version: 3.0.0
13:40:59.659 INFO  GenotypeGVCFs - Built for Spark Version: 3.3.1
13:40:59.659 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:40:59.660 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:40:59.660 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:40:59.661 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:40:59.661 INFO  GenotypeGVCFs - Deflater: IntelDeflater
13:40:59.661 INFO  GenotypeGVCFs - Inflater: IntelInflater
13:40:59.661 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
13:40:59.662 INFO  GenotypeGVCFs - Requester pays: disabled
13:40:59.663 INFO  GenotypeGVCFs - Initializing engine
13:40:59.909 INFO  FeatureManager - Using codec VCFCodec to read file file:///omics/groups/OE0540/internal/users/gleixner/cropseq_uli/rep_ex/test3/output.g.vcf
13:40:59.951 INFO  GenotypeGVCFs - Done initializing engine
13:41:00.006 INFO  ProgressMeter - Starting traversal
13:41:00.007 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
13:41:00.406 WARN  ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr19:55910646 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
13:41:00.476 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position chr19:55910646 and possibly subsequent; at least 10 samples must have called genotypes
13:41:00.528 INFO  ProgressMeter -             unmapped              0.0                    37           4277.5
13:41:00.528 INFO  ProgressMeter - Traversal complete. Processed 37 total variants in 0.0 minutes.
13:41:00.580 INFO  GenotypeGVCFs - Shutting down engine
[October 26, 2023 at 1:41:00 PM CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=285212672
##source=HaplotypeCaller
##bcftools_viewVersion=1.16+htslib-1.16
##bcftools_viewCommand=view -c1 output.vcf; Date=Thu Oct 26 13:41:08 2023
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO,FORMAT/SB,FORMAT/PL; Date=Thu Oct 26 13:41:08 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CGAAGAGGTAGGTGCGAG-1
chr19   55910646        .       AC      A       352.6   .       .       GT:AD:DP:GQ:PGT:PID:PS  0|1:6,9:15:99:0|1:55910646_AC_A:55910646
chr19   55910648        .       AAATCCCCC       A       352.6   .       .       GT:AD:DP:GQ:PGT:PID:PS  0|1:6,9:15:99:0|1:55910646_AC_A:55910646
chr19   55910653        .       CCCCAT  *,C     227.84  .       .       GT:AD:DP:GQ:PGT:PID:PS  1|2:0,9,6:15:99:1|0:55910646_AC_A:55910646
chr19   55910675        .       T       C       30.64   .       .       GT:AD:DP:GQ     0/1:13,2:15:38

The relevant parts of the output

HaplotypeCaller:

chr19   55910648  AAATCCCCC  A,<NON_REF>    GT:AD:DP:GQ:PGT:PID:PS  0|1:6,9,0  :15:99:0|1:55910646_AC_A:55910646
chr19   55910653  CCCCAT     *,C,<NON_REF>  GT:AD:DP:GQ:PGT:PID:PS  2|1:0,9,6,0:15:99:1|0:55910646_AC_A:55910646

GenotypeGVCFs:

chr19   55910648  AAATCCCCC  A              GT:AD:DP:GQ:PGT:PID:PS  0|1:6,9    :15:99:0|1:55910646_AC_A:55910646
chr19   55910653  CCCCAT     *,C            GT:AD:DP:GQ:PGT:PID:PS  1|2:0,9,6  :15:99:1|0:55910646_AC_A:55910646

I understand that

  • PID/PGT can only encode phasing for biallelic variants,
  • so PID/PGT + unphased GT can not be used to infer the correctly phased GT in general
  • but only (and only maybe) in this case PID/PGT + wrongly phased GT can be used to infer the correctly phased GT (because the mangeling of the GT might depend on PGT)

is this correct?

jan-glx avatar Oct 26 '23 12:10 jan-glx

Permalinks for the links in the OP: https://github.com/broadinstitute/gatk/blob/e5a6fed18ed7df112152461699de0f0d01c8b9c0/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs.java#L271 https://github.com/broadinstitute/gatk/blob/e5a6fed18ed7df112152461699de0f0d01c8b9c0/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs.java#L272-L273

This has since been moved to https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs.java#L316 bzw. https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFsEngine.java#L136

However, I think the merged variant contexts are still called and phased correctly, but the phasing gets lost in the regenotyping step, because final_alleles is unphased : https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVariantContextUtils.java#L337-L337

jan-glx avatar Oct 28 '23 13:10 jan-glx