ChIPseeker icon indicating copy to clipboard operation
ChIPseeker copied to clipboard

annotatePeak annotation and transcriptId not matching

Open jdfreden opened this issue 5 years ago • 3 comments

Hello, I am annotating a set of ATAC-seq peaks with ChIPseeker. I noticed with some of the peaks annotated as introns the geneId and transcriptId do not match the transcriptId given in the annotation column. Did I do something wrong?

Thanks for your time.

Here is example code that reproduces the problem:

library(ChIPseeker)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)

#create example data
data = GRanges(Rle(c("chr1", "chr1", "chr7")),
               IRanges(start = c(15608104, 15610296, 81367843), end = c(15608790, 15610993, 81368695)))

#read transcript database for annotation
txdb = TxDb.Mmusculus.UCSC.mm10.knownGene

#annotate the peak data
peakAnno = annotatePeak(data, tssRegion = c(-3000, 3000), TxDb = txdb)

df = as.data.frame(peakAnno)

df

jdfreden avatar Jun 18 '19 17:06 jdfreden

I have the same problem as you. And I am not sure why this is happening.

EsperanzaDai avatar Jul 30 '20 20:07 EsperanzaDai

Seeing the same issue, and your case you provided is a perfect summary.

If you extend your command a bit, to include annoDb="org.Mm.eg.db", flankDistance = 1000, addFlankGeneInfo = T in the annotation call, you can see a bit more information in df.

In the annotation of the first region:

  • Intron ENSMUST00000175681.2 is part of a transcript of Kcnb2`, which is the location/gene where you example peak 1 is located.

  • However, the 'transcript ID' field gives: ENSMUST00000188371.6, which is a transcript of Terf1, and furthermore the closest gene provided in the ENSEMBL field is ENSMUSG00000025925 which is Terf1.

  • Finally, if you look at the flank_geneIds and flank_gene_distances fields, you will see Entrez ID 98741 which is Kcnb2

Can anyone explain how the annotations are working here to define Transcript ID and Gene ID? The documentation suggests that transcript/gene IDs are based on nearest downstream? This confuses me, shouldn't being within the intron of a gene supersede that? Perhaps I am not thinking of the biology correctly.

MikeWLloyd avatar Oct 14 '20 18:10 MikeWLloyd

Actually, this is a common issue here. See #2 and #12 and #31 and #109 and #113 and some explanation here: http://guangchuangyu.github.io/2014/10/multiple-annotation-in-chipseeker/

MikeWLloyd avatar Oct 14 '20 18:10 MikeWLloyd