gatk
gatk copied to clipboard
GATK 4.1.4 for joint calling of hifi and illumina reads
Hello,
I am trying to use gatk/4.1.4.1 and picard/2.22.0 to do joint variant calling of PacBio HiFi reads and Illumina short-reads. My pipeline is basically the recommended one (without base recalibration) where after HaplotypeCaller, I use GenomicsDBImport and GenotypeGVCFs and end up with the vcf file.
The HiFI reads have normal hifi phred scores up to 93 and the illumina are encoded with phred33 quality scores.
The output of the joint variant calling is strange and it causes issues when I try to use it with tools like plink and it makes me wonder if the joint variant calling went badly as well. This is even after variant filtration where I filter for QUAL<30 and QD<2
Here is an example of what the first few lines of my vcf calls look like. Notice the QUAL column that looks like Num,Num:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Illumina_sample0 sample1 sample2 sample_3 sample4 sample5 sample6 hifi_sample
Chr1 10608 . GCA G 174,34 PASS AC=2;AF=0.143;AN=14;BaseQRankSum=0.00;DP=55;ExcessHet=3.3579;FS=7.404;MLEAC=2;MLEAF=0.143;MQ=57.97;MQRankSum=0.00;QD=15.85;ReadPosRankSum=1.00;SOR=3.611 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/1:3,3:6:99:117,0,117 0/0:5,0:5:15:0,15,141 0/1:3,2:5:71:71,0,120 0/0:3,0:3:9:0,9,113 0/0:7,0:7:21:0,21,190 0/0:5,0:5:15:0,15,175 0/0:20,0:20:60:0,60,900
Chr1 10616 . G A 903,42 PASS AC=8;AF=0.571;AN=14;BaseQRankSum=0.00;DP=66;ExcessHet=0.7136;FS=9.277;MLEAC=9;MLEAF=0.643;MQ=59.80;MQRankSum=0.00;QD=20.08;ReadPosRankSum=1.00;SOR=0.251 GT:AD:DP:GQ:PL ./.:0,0:0:.:0,0,0 0/0:10,0:10:30:0,30,367 1/1:0,5:5:15:141,15,0 0/1:2,3:5:75:110,0,75 0/0:3,0:3:9:0,9,113 1/1:0,8:8:24:232,24,0 1/1:0,7:7:21:224,21,0 0/1:14,6:20:99:210,0,570
I believe this can be reproduced with any illumina+hifi samples. Are you aware of this problem? Could the snp calling be wrong because of the different phred score encoding? What do you suggest to do in these cases?
I have read your response in this issue: https://gatk.broadinstitute.org/hc/en-us/community/posts/360072716972-Variant-calling-with-PacBio-HiFi-reads and have used minimap2 in a similar way. (minimap2 -acyYL --secondary=no --MD --eqx -x asm20 -k 20)
Hello, this is still an issue for me as I am still getting two values for the QUAL column even when not using hifi reads but using paired and unpaired illumina together, instead. Is there a proposed solution to this?
Here the error log
Using GATK jar /share/scientific_bin/gatk/4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /share/scientific_bin/gatk/4.1.4.1/gatk-package-4.1.4.1-local.jar IndexFeatureFile -I output/called/final/allsites.filtered.vcf
00:57:08.257 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/scientific_bin/gatk/4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 11, 2023 12:57:08 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
00:57:08.789 INFO IndexFeatureFile - ------------------------------------------------------------
00:57:08.789 INFO IndexFeatureFile - The Genome Analysis Toolkit (GATK) v4.1.4.1
00:57:08.789 INFO IndexFeatureFile - For support and documentation go to https://software.broadinstitute.org/gatk/
00:57:08.790 INFO IndexFeatureFile - Executing as [email protected] on Linux v3.10.0-1160.53.1.el7.x86_64 amd64
00:57:08.790 INFO IndexFeatureFile - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_231-b11
00:57:08.790 INFO IndexFeatureFile - Start Date/Time: 11. September 2023 00:57:07 MESZ
00:57:08.790 INFO IndexFeatureFile - ------------------------------------------------------------
00:57:08.790 INFO IndexFeatureFile - ------------------------------------------------------------
00:57:08.790 INFO IndexFeatureFile - HTSJDK Version: 2.21.0
00:57:08.790 INFO IndexFeatureFile - Picard Version: 2.21.2
00:57:08.790 INFO IndexFeatureFile - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:57:08.790 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:57:08.790 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:57:08.790 INFO IndexFeatureFile - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:57:08.790 INFO IndexFeatureFile - Deflater: IntelDeflater
00:57:08.790 INFO IndexFeatureFile - Inflater: IntelInflater
00:57:08.790 INFO IndexFeatureFile - GCS max retries/reopens: 20
00:57:08.790 INFO IndexFeatureFile - Requester pays: disabled
00:57:08.790 INFO IndexFeatureFile - Initializing engine
00:57:08.790 INFO IndexFeatureFile - Done initializing engine
00:57:09.166 INFO FeatureManager - Using codec VCFCodec to read file file:///share/pool/CompGenomVert/phoxy_snp_calling/VC_HIFI/output/called/final/allsites.filtered.vcf
00:57:09.250 INFO ProgressMeter - Starting traversal
00:57:09.250 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
00:57:09.251 INFO IndexFeatureFile - Shutting down engine
[11. September 2023 00:57:09 MESZ] org.broadinstitute.hellbender.tools.IndexFeatureFile done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2114453504
java.lang.NumberFormatException: For input string: "175,24"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at htsjdk.variant.vcf.VCFUtils.parseVcfDouble(VCFUtils.java:262)
at htsjdk.variant.vcf.AbstractVCFCodec.parseQual(AbstractVCFCodec.java:620)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:422)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:384)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:43)
at org.broadinstitute.hellbender.utils.codecs.ProgressReportingDelegatingCodec.decodeLoc(ProgressReportingDelegatingCodec.java:46)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:689)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.
module load gatk/4.1.4.1
gatk IndexFeatureFile -I output/called/final/allsites.filtered.vcf' returned non-zero exit status 3.