Parser::parseGTF => Data Structure returns an empty hash
Hello,
I am trying to run FEELnc_filter.pl and I encounter the Parser::parseGTF => Data Structure returns an empty hash error as follows:
bash-4.2# FEELnc_filter.pl -i d_magna.filtered.gtf -a daphnia_genome.gtf -b transcript_biotype=protein_coding > candidate_lncRNA.gtf
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Filtered transcripts will be available in file: 'd_magna.filtered.feelncfilter.log'
Parsing file 'd_magna.filtered.gtf'...
Parse input file: [----------------------------------------------------------------------------------------------------]
> Filter size (200): 0
> Filter monoexonic (0): 170
> Filter biexonicsize (25): 0
>> Transcripts left after fitler(s): 36582
Parsing file 'daphnia_genome.gtf'...
Parse input file: [----------------------------------------------------------------------------------------------------]
Parser::parseGTF => Data Structure returns an empty hash
Possible reasons:
*Feature level 'exon' is not present in 3rd field of 'daphnia_genome.gtf'
*chromosome/seqname (chr) or patch chromosome...
*Filtering tag/Attributes (--filter|-f) option returns no results
Try --help for help
daphnia_genome.gtf does contain exon annotations in the 3rd field.
I am not sure what the other two possible reasons refer to, or how to check those.
The input files can be downloaded from these dropbox links:
FEELnc v0.2-0 run in docker image quay.io/biocontainers/feelnc:0.2--pl526_0
I would greatly appreciate it if you can help me troubleshoot this.
Thanks!
PS: the Possible precedence issue with control flow operator warning shows up even with the test data, so I don't think that has anything to do with the above error.
bash-4.2# FEELnc_filter.pl -i transcript_chr38.gtf -a annotation_chr38.gtf -b transcript_biotype=protein_coding > test.gtf
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Filtered transcripts will be available in file: 'transcript_chr38.feelncfilter.log'
Parsing file 'transcript_chr38.gtf'...
Parse input file: [----------------------------------------------------------------------------------------------------]
> Filter size (200): 36
> Filter monoexonic (0): 1265
> Filter biexonicsize (25): 15
>> Transcripts left after fitler(s): 2146
Parsing file 'annotation_chr38.gtf'...
Parse input file: [----------------------------------------------------------------------------------------------------]
38
Intersect fileA: [----------------------------------------------------------------------------------------------------]
Hi @joelnitta ,
First, thanks for all the info and the files (it help to debug)!
In fact the "error" is in the file daphnia_genome.gtf, the biotype are not as expected and instead of protein_coding, it is mRNA.
So you just need to replace protein_coding by mRNA in the command line and it will work (at least for me yes).
Tell us if you have other issues! Bye, Valentin
Thanks @vwucher for the prompt reply! Can you please let me know the code you used to fix daphnia_genome.gtf? I have tried changing protein_coding to mRNA but I am still getting the same error.
Hi,
I didn't fix the file. I just changed your command line by replacing protein_coding by mRNA.
Did you tried that?
Bye
Ah, now I see what you mean! Yes that fixes it, thanks!
(for anybody else who comes across this, it means using -b transcript_biotype=mRNA in the FEELnc_filter.pl command)