gatk
gatk copied to clipboard
CNNScoreVariants crashes with java.lang.NullPointerException
Looks like this java.lang.NullPointerException is from an environment set up issue.
This request was created from a contribution made by Jordi Maggi on April 25, 2022 09:25 UTC.
--
Hi,
I created a conda environment and installed gatk4 through conda install -c bioconda gatk4
. I have been using this environment to run all steps of the single sample germline variant calling best practices workflow (both gatk and picard). However, I have never been able to run CNNScoreVariants with this setup, as it always results in a java.lang.NullPointerException error. The only way I am able to run this step is by running it through the docker image you provide. That, however, is not ideal for our setup.
Any idea as to what I may try to be able to run it directly?
GATK version:
Using GATK jar /home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar --version
The Genome Analysis Toolkit (GATK) v4.2.5.0
HTSJDK Version: 2.24.1
Picard Version: 2.25.4
Exact command:
gatk CNNScoreVariants -I 73318_WES_hg19_recalibrated.sorted.bam -V 73318_80_IDTv1.vcf.gz -R /media/analyst/Data/Reference_data/hg19.fa -O /media/analyst/Data/73318_CNNScore_test.vcf.gz -tensor-type read_tensor > /media/analyst/Data/CNNScoreVariants.log
Entire console output:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar CNNScoreVariants -I 73318_WES_hg19_recalibrated.sorted.bam -V 73318_80_IDTv1.vcf.gz -R /media/analyst/Data/Reference_data/hg19.fa -O /media/analyst/Data/73318_CNNScore_test.vcf.gz -tensor-type read_tensor
11:17:58.509 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 25, 2022 11:17:58 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:17:58.668 INFO CNNScoreVariants - ------------------------------------------------------------
11:17:58.668 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.2.5.0
11:17:58.669 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
11:17:58.669 INFO CNNScoreVariants - Executing as analyst@WGS on Linux v5.13.0-40-generic amd64
11:17:58.669 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v10.0.2+13
11:17:58.669 INFO CNNScoreVariants - Start Date/Time: April 25, 2022 at 11:17:58 AM CEST
11:17:58.669 INFO CNNScoreVariants - ------------------------------------------------------------
11:17:58.669 INFO CNNScoreVariants - ------------------------------------------------------------
11:17:58.670 INFO CNNScoreVariants - HTSJDK Version: 2.24.1
11:17:58.670 INFO CNNScoreVariants - Picard Version: 2.25.4
11:17:58.670 INFO CNNScoreVariants - Built for Spark Version: 2.4.5
11:17:58.670 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:17:58.670 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:17:58.670 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:17:58.670 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:17:58.670 INFO CNNScoreVariants - Deflater: IntelDeflater
11:17:58.670 INFO CNNScoreVariants - Inflater: IntelInflater
11:17:58.671 INFO CNNScoreVariants - GCS max retries/reopens: 20
11:17:58.671 INFO CNNScoreVariants - Requester pays: disabled
11:17:58.671 INFO CNNScoreVariants - Initializing engine
WARNING: BAM index file /media/analyst/Data/WES/73318/73318_WES_hg19_recalibrated.sorted.bai is older than BAM /media/analyst/Data/WES/73318/73318_WES_hg19_recalibrated.sorted.bam
11:17:58.969 INFO FeatureManager - Using codec VCFCodec to read file file:///media/analyst/Data/WES/73318/73318_80_IDTv1.vcf.gz
11:17:59.079 INFO CNNScoreVariants - Done initializing engine
11:17:59.081 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/analyst/anaconda3/envs/snakemake_env/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
11:17:59.187 INFO CNNScoreVariants - Done scoring variants with CNN.
11:17:59.187 INFO CNNScoreVariants - Shutting down engine
[April 25, 2022 at 11:17:59 AM CEST] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1895825408
java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.hasMessage(ProcessControllerAckResult.java:49)
at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.getDisplayMessage(ProcessControllerAckResult.java:69)
at org.broadinstitute.hellbender.utils.runtime.StreamingProcessController.waitForAck(StreamingProcessController.java:229)
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:216)
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:313)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1083)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
(created from Zendesk ticket #282399)
gz#282399
The underlying issue here is is that the GATK conda env environment isn't established since bioconda doesn't appear to configure it. The NPE needs is fixed by #7816.
In this particular case it appears that some of the requirements are satisfied, since the code gets past the initial check to see if the GATK python code is available. But then the actual CNN code can't be loaded.
I'm getting the same issue @cmnbroad @GATKSupportTeam .
Any recommendations on how to proceed please?
Thanks in advance.
Using GATK jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar CNNScoreVariants --version
Using GATK jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar CNNScoreVariants -R /home/fmbuga/tools/hg38/hg38.fa -V /home/fmbuga/gatk4_gcp_wgs/06_vcf_raw/SRR16299720_dedup_AORRG_recal_raw.vcf -O ./08_vcf_1dCNN/SRR16299720_dedup_AORRG_recal_raw_1dCNN_scored.vcf
05:39:39.149 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
05:39:39.304 INFO CNNScoreVariants - ------------------------------------------------------------
05:39:39.305 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
05:39:39.305 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
05:39:39.305 INFO CNNScoreVariants - Executing as [email protected] on Linux v3.10.0-1062.18.1.el7.x86_64 amd64
05:39:39.305 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
05:39:39.305 INFO CNNScoreVariants - Start Date/Time: October 9, 2022 5:39:39 AM PDT
05:39:39.305 INFO CNNScoreVariants - ------------------------------------------------------------
05:39:39.306 INFO CNNScoreVariants - ------------------------------------------------------------
05:39:39.306 INFO CNNScoreVariants - HTSJDK Version: 2.24.1
05:39:39.306 INFO CNNScoreVariants - Picard Version: 2.27.1
05:39:39.306 INFO CNNScoreVariants - Built for Spark Version: 2.4.5
05:39:39.307 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
05:39:39.307 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
05:39:39.307 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
05:39:39.307 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
05:39:39.307 INFO CNNScoreVariants - Deflater: IntelDeflater
05:39:39.307 INFO CNNScoreVariants - Inflater: IntelInflater
05:39:39.307 INFO CNNScoreVariants - GCS max retries/reopens: 20
05:39:39.307 INFO CNNScoreVariants - Requester pays: disabled
05:39:39.307 INFO CNNScoreVariants - Initializing engine
05:39:39.905 INFO FeatureManager - Using codec VCFCodec to read file file:///home/fmbuga/gatk4_gcp_wgs/06_vcf_raw/SRR16299720_dedup_AORRG_recal_raw.vcf
05:39:40.108 INFO CNNScoreVariants - Done initializing engine
05:39:40.109 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/fmbuga/.conda/envs/gatk4/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
05:39:40.429 INFO CNNScoreVariants - Done scoring variants with CNN.
05:39:40.429 INFO CNNScoreVariants - Shutting down engine
[October 9, 2022 5:39:40 AM PDT] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1903165440
java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.hasMessage(ProcessControllerAckResult.java:49)
at org.broadinstitute.hellbender.utils.runtime.ProcessControllerAckResult.getDisplayMessage(ProcessControllerAckResult.java:69)
at org.broadinstitute.hellbender.utils.runtime.StreamingProcessController.waitForAck(StreamingProcessController.java:235)
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.waitForAck(StreamingPythonScriptExecutor.java:216)
at org.broadinstitute.hellbender.utils.python.StreamingPythonScriptExecutor.sendSynchronousCommand(StreamingPythonScriptExecutor.java:183)
at org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants.onTraversalStart(CNNScoreVariants.java:313)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1083)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
@felixm3 The bioconda environment doesn't actually configure the gatk conda environment (it installs gatk, but not the python dependencies required for CNNScoreVariants). You need to set up the gatk conda environment, as described in the Python Dependencies section in the README.md file: https://github.com/broadinstitute/gatk#readme.
I spent a long time struggling to install the environment as it hasn't been updated to the new tensorflow and keras versions which changed syntax in the newer versions which cause a lot of the errors you see here. I managed to get it all working by fixing the versions in the yaml but conda takes a loooooong time to solve the environment so I would highly recommend using mamba or micromamba! I'm attaching the yaml I used to get CNNScoreVariants to work here (renamed as .txt as it won't attach as a yml).
Would it be possible to update the gatktool
bioconda repo to ensure that all python dependencies are well installed to run CNNScoreVariants ? It would be really helpful and easier to manage a GATK conda environment.