rnaseqc
rnaseqc copied to clipboard
collapse_annotation.py cannot process the gtf file generated by gffread
Dear all,
To prepare the gtf file used in rnaseqc
, I first converted the gff file to gtf file using following command,
gffread-0.12.7.Linux_x86_64/gffread -T -o out.gtf input.gff
, however it give me error when running collapse_annotation.py out.gtf collapse.gtf
Traceback (most recent call last):
File "collapse_annotation.py", line 294, in <module>
annotation = Annotation(args.transcript_gtf)
File "collapse_annotation.py", line 89, in __init__
attributes.pop('transcript_type'), g, start_pos, end_pos)
KeyError: 'transcript_type'
Based on above error message, I added gene_biotype and transcript_type information to the end of each line.
perl -e 'while(<>){chomp; print $_," gene_biotype \"protein_coding\"; transcript_biotype \"protein_coding\";\n"}' out.gtf >processed.gtf
Finally, when running collapse_annotation.py processed.gtf collapse.gtf
, another error occured.
Traceback (most recent call last):
File "collapse_annotation.py", line 294, in <module>
annotation = Annotation(args.transcript_gtf)
File "collapse_annotation.py", line 89, in __init__
attributes.pop('transcript_type'), g, start_pos, end_pos)
UnboundLocalError: local variable 'g' referenced before assignment
I attached the processed.gtf here. How should this be handled? processed.zip
Thank you in advance. Best wishes, Zheng zhuqing
RNA-SeQC requires GTF in the format specified at https://www.gencodegenes.org/pages/data_format.html, with a gene > transcript > exon hierarchy in the feature type
column (additional features like CDS etc are also supported). Your GTF is missing gene features, it only has transcripts and exonic features.
Is there a tool to convert a gtf to the required format? I am also having issues with that. Thank you in advance