gatk
gatk copied to clipboard
GtfToBed errors out with `because "gene" is null`
This bug report is regarding a new tool, GtfToBed, which was introduced in #8942 PR. The following code creates a reproducible example of the error:
Get the necessary files
Reference genome
if [ ! -f 'hg38.fa.gz' ]; then
echo 'Downloading the reference genome'
wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/latest/hg38.fa.gz
fi
sha256sum 'hg38.fa.gz'
c1dd87068c254eb53d944f71e51d1311964fce8de24d6fc0effc9c61c01527d4 hg38.fa.gz
GTF file
if [ ! -f 'hg38.ncbiRefSeq.gtf.gz' ]; then
echo 'Downloading the reference genome'
wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
fi
sha256sum 'hg38.ncbiRefSeq.gtf.gz'
856919cfc5854079e70dd016048045092fd79b782aa8da9dbbd1c51a9046d8a4 hg38.ncbiRefSeq.gtf.gz
Prepare files
Unpack the compressed files
gunzip --keep 'hg38.ncbiRefSeq.gtf.gz' 'hg38.fa.gz'
Create the dict file
./gatk-4.6.1.0/gatk CreateSequenceDictionary \
--REFERENCE 'hg38.fa' \
--VERBOSITY WARNING
[Thu Feb 27 12:20:49 EET 2025] CreateSequenceDictionary --VERBOSITY WARNING --REFERENCE hg38.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Thu Feb 27 12:20:49 EET 2025] Executing as mehrad@pamp-precision-tower on Linux 6.12.16-1-lts amd64; OpenJDK 64-Bit Server VM 23.0.2; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.6.1.0 [Thu Feb 27 12:21:00 EET 2025] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.18 minutes. Runtime.totalMemory()=3816816640
Convert GTF to BED
./gatk-4.6.1.0/gatk GtfToBed \
--gtf-path 'hg38.ncbiRefSeq.gtf' \
--sequence-dictionary 'hg38.dict' \
--output 'blah.bed' \
--verbosity WARNING
Using GATK jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar GtfToBed --gtf-path hg38.ncbiRefSeq.gtf --sequence-dictionary hg38.dict --output blah.bed --verbosity WARNING SLF4J(W): Class path contains multiple SLF4J providers. SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c] SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@53125718] SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c] [February 27, 2025, 12:26:04 PM EET] org.broadinstitute.hellbender.tools.walkers.conversion.GtfToBed done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=134217728 java.lang.NullPointerException: Cannot invoke "org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfGeneFeature.addTranscript(org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfTranscriptFeature)" because "gene" is null at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.aggregateRecordsIntoGeneFeature(AbstractGtfCodec.java:339) at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:170) at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:23) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:377) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.<init>(TribbleIndexedFeatureReader.java:344) at htsjdk.tribble.TribbleIndexedFeatureReader.iterator(TribbleIndexedFeatureReader.java:311) at org.broadinstitute.hellbender.engine.FeatureDataSource.iterator(FeatureDataSource.java:531) at java.base/java.lang.Iterable.spliterator(Unknown Source) at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1182) at org.broadinstitute.hellbender.engine.FeatureWalker.traverse(FeatureWalker.java:97) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1119) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209) at org.broadinstitute.hellbender.Main.main(Main.java:306)