genomation
genomation copied to clipboard
readTranscriptFeatures with GTF
readTranscriptFeatures()
reads a BED file. Gene features are commonly stored as a GTF file. Is there a way to import a GTF file in the proper format for annotateWithGeneParts()
? There is a convenient gffToGRanges()
function, but you still need to convert the resulting GRanges object to a GRangesList. Is there already a function for that?
Hi @igordot ,
Yes, you are right, readTranscriptFeatures()
doesn't support gff files. Either you can convert your gff file to a bed file, or what you might be interested in is to use first use gffToGRanges
then manipulate the output GRanges object a bit to get the features you are interested in - promoters, exons, introns etc and then use annotateWithFeatures()
, e.g.:
library(genomation)
data(cage)
gff.file = system.file('extdata/chr21.refseq.hg19.gtf', package='genomation')
gr21=gffToGRanges(gff.file)
# here you can manipulate the GRanges object, e.g.:
grl21=as(split(gr21, gr21$type), "GRangesList")
> annotateWithFeatures(cage, grl21)
summary of target set annotation with feature annotation:
Rows in target set: 2326
----------------------------
percentage of target elements overlapping with features:
exon stop_codon CDS start_codon
29.45 0.47 13.54 1.46
percentage of feature elements overlapping with target:
exon stop_codon CDS start_codon
18.55 4.41 14.31 12.76
Hope it helps, Kasia
My concern with that approach is that it gives you different results than the BED file. I assume "chr21.refseq.hg19.bed" and "chr21.refseq.hg19.gtf" should be comparable.
I am not sure what exactly are you asking me about. Are you asking me why a gtf and a bed file look differently? There are two different file formats, but they should be comparable, I don't know the details. If you are interested in annotating your regions of interest with exons, introns, and promoters (output regions of readTranscriptFeatures
) and you are not sure how to get their coordinates from a gtf file then check out .e.g GenomicFeatures::makeTxDbFromGFF
, rtracklayer::exonsBy
and rtracklayer::intronsByTranscript
, and promoters are just 1kb (by default in genomation) flanking regions around TSS.
I understand that the BED and GTF files are different. I wanted to see if there was a way to achieve the same annotation results from both.
It sounds like it is possible, but requires a few extra steps, such as rtracklayer::exonsBy
and rtracklayer::intronsByTranscript
.
I think following Kasia's suggestion would work. I personally do not have the code to do that. But the idea is you need to re-create a GRangesList object returned by the readTranscriptFeatures() using a bed file. You need to parse the GTF with the rtracklayer and/or genomicFeatures package functions. Extract exon, intron and promoter coordinates and arrange them in a GRangesList object that is similar to the object returned by our own function.
Best, Altuna
On Mon, Jun 15, 2020 at 7:11 PM igor [email protected] wrote:
I understand that the BED and GTF files are different. I wanted to see if there was a way to achieve the same annotation results from both.
It sounds like it is possible, but requires a few extra steps, such as rtracklayer::exonsBy and rtracklayer::intronsByTranscript.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BIMSBbioinfo/genomation/issues/192#issuecomment-644259530, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EP52VQEL72QV54OSXLRWZI33ANCNFSM4N22SQPQ .
Thanks for clarifying. I was hoping all or some parts were already included in the package. It would be a nice feature to have.