gatk
gatk copied to clipboard
GenotypeGVCFs seems to ignore `--genotype-assignment-method `
It seems that the genotype assignment method used in GenotypeGVCFs is hard-coded to (mostly) PREFER_PLS
:
https://github.com/broadinstitute/gatk/blob/423d106612074aa3480b67b321dc66426b1c600a/src/main/java/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFsEngine.java#L377
and the --genotype-assignment-method
thus has no effect:
for example (same data as in #5727#issuecomment-1781017195) :
$ gatk GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf --genotype-assignment-method SET_TO_NO_CALL
Using GATK jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar GenotypeGVCFs -R chr19.fa -V output.g.vcf -O output.vcf --genotype-assignment-method SET_TO_NO_CALL
Picked up JAVA_TOOL_OPTIONS: -Djava.net.useSystemProxies=true
14:10:28.241 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/omics/groups/OE0540/internal/software/jvm/gatk/4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:10:28.300 INFO GenotypeGVCFs - ------------------------------------------------------------
14:10:28.305 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.4.0.0
14:10:28.305 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
14:10:28.305 INFO GenotypeGVCFs - Executing as gleixner@odcf-worker02 on Linux v3.10.0-1160.76.1.el7.x86_64 amd64
14:10:28.305 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v17+35-2724
14:10:28.306 INFO GenotypeGVCFs - Start Date/Time: October 29, 2023 at 2:10:28 PM CET
14:10:28.306 INFO GenotypeGVCFs - ------------------------------------------------------------
14:10:28.306 INFO GenotypeGVCFs - ------------------------------------------------------------
14:10:28.307 INFO GenotypeGVCFs - HTSJDK Version: 3.0.5
14:10:28.307 INFO GenotypeGVCFs - Picard Version: 3.0.0
14:10:28.308 INFO GenotypeGVCFs - Built for Spark Version: 3.3.1
14:10:28.308 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:10:28.309 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:10:28.309 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:10:28.309 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:10:28.310 INFO GenotypeGVCFs - Deflater: IntelDeflater
14:10:28.310 INFO GenotypeGVCFs - Inflater: IntelInflater
14:10:28.310 INFO GenotypeGVCFs - GCS max retries/reopens: 20
14:10:28.310 INFO GenotypeGVCFs - Requester pays: disabled
14:10:28.311 INFO GenotypeGVCFs - Initializing engine
14:10:28.532 INFO FeatureManager - Using codec VCFCodec to read file file:///omics/groups/OE0540/internal/users/gleixner/cropseq_uli/rep_ex/test3/output.g.vcf
14:10:28.566 INFO GenotypeGVCFs - Done initializing engine
14:10:28.620 INFO ProgressMeter - Starting traversal
14:10:28.622 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
14:10:29.048 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr19:55910646 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
14:10:29.118 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr19:55910646 and possibly subsequent; at least 10 samples must have called genotypes
14:10:29.175 INFO ProgressMeter - unmapped 0.0 37 4021.7
14:10:29.175 INFO ProgressMeter - Traversal complete. Processed 37 total variants in 0.0 minutes.
14:10:29.236 INFO GenotypeGVCFs - Shutting down engine
[October 29, 2023 at 2:10:29 PM CET] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=285212672
results in a vcf file that still has called genotypes:
$ bcftools view output.vcf -c1 | bcftools annotate -x INFO,FORMAT/SB | tail
##source=HaplotypeCaller
##bcftools_viewVersion=1.16+htslib-1.16
##bcftools_viewCommand=view -c1 output.vcf; Date=Sun Oct 29 14:09:42 2023
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO,FORMAT/SB; Date=Sun Oct 29 14:09:42 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CGAAGAGGTAGGTGCGAG-1
chr19 55910646 . AC A 352.6 . . GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:6,9:15:99:0|1:55910646_AC_A:360,0,225:55910646
chr19 55910648 . AAATCCCCC A 352.6 . . GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:6,9:15:99:0|1:55910646_AC_A:360,0,225:55910646
chr19 55910653 . CCCCAT *,C 227.84 . . GT:AD:DP:GQ:PGT:PID:PL:PS 1|2:0,9,6:15:99:1|0:55910646_AC_A:630,252,225,378,0,360:55910646
chr19 55910675 . T C 30.64 . . GT:AD:DP:GQ:PL 0/1:13,2:15:38:38,0,505
GenotypeGVCFs should either make use of the parameter or not accept it.