gatk icon indicating copy to clipboard operation
gatk copied to clipboard

GenotypeGVCFs seems to ignore `--genotype-assignment-method `

Open jan-glx opened this issue 1 year ago • 0 comments

It seems that the genotype assignment method used in GenotypeGVCFs is hard-coded to (mostly) PREFER_PLS : https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFsEngine.java#L377

and the --genotype-assignment-method thus has no effect: for example (same data as in #5727#issuecomment-1781017195) :

$ gatk GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf  --genotype-assignment-method SET_TO_NO_CALL
Using GATK jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf --genotype-assignment-method SET_TO_NO_CALL
Picked up JAVA_TOOL_OPTIONS:  -Djava.net.useSystemProxies=true
14:10:28.241 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:10:28.300 INFO  GenotypeGVCFs - ------------------------------------------------------------
14:10:28.305 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.4.0.0
14:10:28.305 INFO  GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
14:10:28.305 INFO  GenotypeGVCFs - Executing as gleixner@odcf-worker02 on Linux v3.10.0-1160.76.1.el7.x86_64 amd64
14:10:28.305 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17+35-2724
14:10:28.306 INFO  GenotypeGVCFs - Start Date/Time: October 29, 2023 at 2:10:28 PM CET
14:10:28.306 INFO  GenotypeGVCFs - ------------------------------------------------------------
14:10:28.306 INFO  GenotypeGVCFs - ------------------------------------------------------------
14:10:28.307 INFO  GenotypeGVCFs - HTSJDK Version: 3.0.5
14:10:28.307 INFO  GenotypeGVCFs - Picard Version: 3.0.0
14:10:28.308 INFO  GenotypeGVCFs - Built for Spark Version: 3.3.1
14:10:28.308 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:10:28.309 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:10:28.309 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:10:28.309 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:10:28.310 INFO  GenotypeGVCFs - Deflater: IntelDeflater
14:10:28.310 INFO  GenotypeGVCFs - Inflater: IntelInflater
14:10:28.310 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
14:10:28.310 INFO  GenotypeGVCFs - Requester pays: disabled
14:10:28.311 INFO  GenotypeGVCFs - Initializing engine
14:10:28.532 INFO  FeatureManager - Using codec VCFCodec to read file file:///omics/groups/OE0540/internal/users/gleixner/cropseq_uli/rep_ex/test3/output.g.vcf
14:10:28.566 INFO  GenotypeGVCFs - Done initializing engine
14:10:28.620 INFO  ProgressMeter - Starting traversal
14:10:28.622 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
14:10:29.048 WARN  ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr19:55910646 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
14:10:29.118 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position chr19:55910646 and possibly subsequent; at least 10 samples must have called genotypes
14:10:29.175 INFO  ProgressMeter -             unmapped              0.0                    37           4021.7
14:10:29.175 INFO  ProgressMeter - Traversal complete. Processed 37 total variants in 0.0 minutes.
14:10:29.236 INFO  GenotypeGVCFs - Shutting down engine
[October 29, 2023 at 2:10:29 PM CET] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=285212672

results in a vcf file that still has called genotypes:

$ bcftools view output.vcf -c1 | bcftools annotate -x INFO,FORMAT/SB | tail
##source=HaplotypeCaller
##bcftools_viewVersion=1.16+htslib-1.16
##bcftools_viewCommand=view -c1 output.vcf; Date=Sun Oct 29 14:09:42 2023
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO,FORMAT/SB; Date=Sun Oct 29 14:09:42 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CGAAGAGGTAGGTGCGAG-1
chr19   55910646        .       AC      A       352.6   .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       0|1:6,9:15:99:0|1:55910646_AC_A:360,0,225:55910646
chr19   55910648        .       AAATCCCCC       A       352.6   .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       0|1:6,9:15:99:0|1:55910646_AC_A:360,0,225:55910646
chr19   55910653        .       CCCCAT  *,C     227.84  .       .       GT:AD:DP:GQ:PGT:PID:PL:PS       1|2:0,9,6:15:99:1|0:55910646_AC_A:630,252,225,378,0,360:55910646
chr19   55910675        .       T       C       30.64   .       .       GT:AD:DP:GQ:PL  0/1:13,2:15:38:38,0,505

GenotypeGVCFs should either make use of the parameter or not accept it.

jan-glx avatar Oct 29 '23 13:10 jan-glx