ChIPseeker icon indicating copy to clipboard operation
ChIPseeker copied to clipboard

Question about multiple annotation in ChIPseeker

Open cyruinous opened this issue 4 years ago • 1 comments

Hello GuangchuangYu,

I have a question about annotatePeak.

I think people have already asked a lot of duplicates, but the answers to the issues or documents posted so far have not solved the question.

The positions of the peaks I want to annotate are as follows.

Mouse, mm10, chr1: 5062832-5063080

Peak File: sample_peak.txt TxDb_Information: TxDb_Information.txt

I analyzed annotatePeak with the command below.

require (ChIPseeker) library (GenomicFeatures) peak <-readPeakFile ("sample_peak.txt") peakAnno_def <-annotatePeak (peak, tssRegion = c (-3000,3000), TxDb = txdb)


As a result I got [annotation], [transcriptId], [distanceToTSS] as below. annotation: Exon (ENSMUST00000169520.1 / ENSMUST00000169520.1, exon 1 of 1) transcriptId: ENSMUST00000192847.5 distanceToTSS: -6938

The location of each transcript obtained above is shown below. ENSMUST00000169520.1 is located at chr1: 5063060-5064647 (+), ENSMUST00000192847.5 is located at chr1: 5070018-5162340 (+)

Therefore, the distance between the input peak and the TSS of these two Transcripts is expected to be calculated as follows. ENSMUST00000169520.1: 0 bp (overlap) ENSMUST00000192847.5: -6938 bp

As previously posted (https://github.com/YuLab-SMU/ChIPseeker/issues/2) or (http://guangchuangyu.github.io/2014/10/multiple-annotation-in-chipseeker/), ChIPseeker is annotate both types. i) the TSS (transcriptId, distanceToTSS column) closest to the peak, ii) the genomic position (annotation columns) of the peak.

As I expected, the input peak is assigned to [Promoter (<=1kb)] even if the peak is located at exon because the distance to TSS is between tssRegion = c (-3000,3000) for [ENSMUST00000169520.1]. But I confuse why it is [Exon] and why [distanceToTSS] is set to [ENSMUST00000192847.5] farther than [ENSMUST00000169520.1]. (In the future, I will do it with [tssRegion = c (-2000,500)] option to consider the gene direction.)

In addition, another peak (chr1: 172342749-172343535) used in the analysis is located in the intron of ENSMUST00000056136.3 (chr1: 172341210-172374085), but since the distance from TSS is defined as a promoter with 1539 bp, [Promoter (1 -2kb)] has been annotated.

I have illustrated the situation described earlier.

example

If possible, I would appreciate if you could indicate what values should be represented in the [annotation], [transcriptId], and [distanceToTSS] columns.

It would be really helpful if you could answer me. I look forward to your reply.

Thank you!

cyruinous avatar Feb 04 '20 11:02 cyruinous

@cyruinous did you ever work out a solution to the above issue you laid out?

MikeWLloyd avatar Oct 14 '20 23:10 MikeWLloyd