gatk icon indicating copy to clipboard operation
gatk copied to clipboard

GtfToBed errors out with `because "gene" is null`

Open mmahmoudian opened this issue 7 months ago • 2 comments

This bug report is regarding a new tool, GtfToBed, which was introduced in #8942 PR. The following code creates a reproducible example of the error:

Get the necessary files

Reference genome

if [ ! -f 'hg38.fa.gz' ]; then
    echo 'Downloading the reference genome'
    wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/latest/hg38.fa.gz
fi

sha256sum 'hg38.fa.gz'
c1dd87068c254eb53d944f71e51d1311964fce8de24d6fc0effc9c61c01527d4  hg38.fa.gz

GTF file

if [ ! -f 'hg38.ncbiRefSeq.gtf.gz' ]; then
    echo 'Downloading the reference genome'
    wget https://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/genes/hg38.ncbiRefSeq.gtf.gz
fi

sha256sum 'hg38.ncbiRefSeq.gtf.gz'
856919cfc5854079e70dd016048045092fd79b782aa8da9dbbd1c51a9046d8a4  hg38.ncbiRefSeq.gtf.gz

Prepare files

Unpack the compressed files

gunzip --keep 'hg38.ncbiRefSeq.gtf.gz' 'hg38.fa.gz'

Create the dict file

./gatk-4.6.1.0/gatk CreateSequenceDictionary \
                    --REFERENCE 'hg38.fa' \
                    --VERBOSITY WARNING
[Thu Feb 27 12:20:49 EET 2025] CreateSequenceDictionary --VERBOSITY WARNING --REFERENCE hg38.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu Feb 27 12:20:49 EET 2025] Executing as mehrad@pamp-precision-tower on Linux 6.12.16-1-lts amd64; OpenJDK 64-Bit Server VM 23.0.2; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.6.1.0
[Thu Feb 27 12:21:00 EET 2025] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=3816816640

Convert GTF to BED

./gatk-4.6.1.0/gatk GtfToBed \
                    --gtf-path 'hg38.ncbiRefSeq.gtf' \
                    --sequence-dictionary 'hg38.dict' \
                    --output 'blah.bed' \
                    --verbosity WARNING
Using GATK jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/mehrad/tmp/gatk-4.6.1.0/gatk-package-4.6.1.0-local.jar GtfToBed --gtf-path hg38.ncbiRefSeq.gtf --sequence-dictionary hg38.dict --output blah.bed --verbosity WARNING
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@53125718]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@4ee8051c]
[February 27, 2025, 12:26:04 PM EET] org.broadinstitute.hellbender.tools.walkers.conversion.GtfToBed done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=134217728
java.lang.NullPointerException: Cannot invoke "org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfGeneFeature.addTranscript(org.broadinstitute.hellbender.utils.codecs.gtf.GencodeGtfTranscriptFeature)" because "gene" is null
	at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.aggregateRecordsIntoGeneFeature(AbstractGtfCodec.java:339)
	at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:170)
	at org.broadinstitute.hellbender.utils.codecs.gtf.AbstractGtfCodec.decode(AbstractGtfCodec.java:23)
	at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:377)
	at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.<init>(TribbleIndexedFeatureReader.java:344)
	at htsjdk.tribble.TribbleIndexedFeatureReader.iterator(TribbleIndexedFeatureReader.java:311)
	at org.broadinstitute.hellbender.engine.FeatureDataSource.iterator(FeatureDataSource.java:531)
	at java.base/java.lang.Iterable.spliterator(Unknown Source)
	at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1182)
	at org.broadinstitute.hellbender.engine.FeatureWalker.traverse(FeatureWalker.java:97)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1119)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
	at org.broadinstitute.hellbender.Main.main(Main.java:306)

mmahmoudian avatar Feb 27 '25 10:02 mmahmoudian