FEELnc icon indicating copy to clipboard operation
FEELnc copied to clipboard

Parser::parseGTF => Data Structure returns an empty hash

Open joelnitta opened this issue 3 years ago • 3 comments

Hello,

I am trying to run FEELnc_filter.pl and I encounter the Parser::parseGTF => Data Structure returns an empty hash error as follows:

bash-4.2# FEELnc_filter.pl -i d_magna.filtered.gtf -a daphnia_genome.gtf -b transcript_biotype=protein_coding > candidate_lncRNA.gtf
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Filtered transcripts will be available in file: 'd_magna.filtered.feelncfilter.log'
Parsing file 'd_magna.filtered.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
> Filter size (200): 0
> Filter monoexonic (0): 170
> Filter biexonicsize (25): 0
>> Transcripts left after fitler(s): 36582
Parsing file 'daphnia_genome.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
Parser::parseGTF => Data Structure returns an empty hash
Possible reasons:
        *Feature level 'exon' is not present in 3rd field of 'daphnia_genome.gtf'
        *chromosome/seqname (chr) or patch chromosome...
        *Filtering tag/Attributes (--filter|-f) option returns no results
Try --help for help

daphnia_genome.gtf does contain exon annotations in the 3rd field.

I am not sure what the other two possible reasons refer to, or how to check those.

The input files can be downloaded from these dropbox links:

FEELnc v0.2-0 run in docker image quay.io/biocontainers/feelnc:0.2--pl526_0

I would greatly appreciate it if you can help me troubleshoot this.

Thanks!

joelnitta avatar Sep 01 '22 00:09 joelnitta

PS: the Possible precedence issue with control flow operator warning shows up even with the test data, so I don't think that has anything to do with the above error.

bash-4.2# FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf -b transcript_biotype=protein_coding > test.gtf
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Filtered transcripts will be available in file: 'transcript_chr38.feelncfilter.log'
Parsing file 'transcript_chr38.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
> Filter size (200): 36
> Filter monoexonic (0): 1265
> Filter biexonicsize (25): 15
>> Transcripts left after fitler(s): 2146
Parsing file 'annotation_chr38.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
38
Intersect fileA:              [----------------------------------------------------------------------------------------------------]

joelnitta avatar Sep 01 '22 01:09 joelnitta

Hi @joelnitta ,

First, thanks for all the info and the files (it help to debug)! In fact the "error" is in the file daphnia_genome.gtf, the biotype are not as expected and instead of protein_coding, it is mRNA. So you just need to replace protein_coding by mRNA in the command line and it will work (at least for me yes).

Tell us if you have other issues! Bye, Valentin

vwucher avatar Sep 01 '22 09:09 vwucher

Thanks @vwucher for the prompt reply! Can you please let me know the code you used to fix daphnia_genome.gtf? I have tried changing protein_coding to mRNA but I am still getting the same error.

joelnitta avatar Sep 03 '22 00:09 joelnitta

Hi,

I didn't fix the file. I just changed your command line by replacing protein_coding by mRNA. Did you tried that?

Bye

vwucher avatar Sep 05 '22 09:09 vwucher

Ah, now I see what you mean! Yes that fixes it, thanks!

(for anybody else who comes across this, it means using -b transcript_biotype=mRNA in the FEELnc_filter.pl command)

joelnitta avatar Sep 05 '22 11:09 joelnitta