biocode
biocode copied to clipboard
Incorrect parent features from convert_tRNAScanSE_to_gff3.pl
Hi there,
First off, thank you so much for making this script! I'm trying to incorporate filtered tRNAScanSE results into my genome annotation. However, the results from this script are giving me some issues. It looks like the only parent features that exist do not point to the correct attribute. The biggest problem is with multi-exon tRNAs. Here's an example of a few lines from the resulting gff file:
WCK01_AAF20200214_F8-ctg250 tRNAScan-SE gene 905646 905751 75.6 + . ID=tRNA-Leu39_gene
WCK01_AAF20200214_F8-ctg250 tRNAScan-SE tRNA 905646 905751 75.6 + . ID=tRNA-Leu39_tRNA;Name=tRNA-Leu;anticodon=CAA
WCK01_AAF20200214_F8-ctg250 tRNAScan-SE exon 905646 905683 75.6 + . ID=tRNA-Leu39_exon;Note=contains predicted Intron
WCK01_AAF20200214_F8-ctg250 tRNAScan-SE exon 905706 905751 75.6 + . ID=tRNA-Leu39_exon;Parent=tRNA-Leu39_exon
As you can see, the only parent attribute belongs to the second exon and it points toward the exon IDs of both exons which are identical. Do you think you might be able to modify the script so that exon features have unique ID and that the parents point towards the tRNA?
Thanks so much!
Thanks for the report - do you think you can attach at least a partial test input file?
Of course! My trna file is very small, as it's the output of EukHighConfidenceFilter, the internal script of tRNAScanSE that filters for high confidence RNAs. I'm thinking now that perhaps it's the existing columns that are messing up the results of the gff file, as EukHighConfidenceFilter requires that certain extra columns are included in the regular tRNAScanSE output. I've attached the input and the resulting gff; they had to have the .txt suffix to attach properly. take2_filtered.txt take2_filtered_gff.txt
Thanks again!