Bismark icon indicating copy to clipboard operation
Bismark copied to clipboard

CpG count query

Open MarkOEze opened this issue 9 months ago • 4 comments

Hello,

I have some DNA methylation data, which I prepared using EM-Seq and got around 236 million paired-end reads after sequencing on the Illumina NextSeq platform. I have some concerns regarding the number of CpGs I obtained. It is known that there are 28 million CpGs in the human genome; however, the number of CpGs I got, especially at 1X, 3X, and 5x coverage, suggests that the CpG counts may have been overestimated.

I trimmed my reads using Trimgalore, aligned them and extracted the methylation information using Bismark. Please look at the snippets of my codes and the corresponding results below.

Alignment: bismark --bowtie2 --gzip -N 1 --un --ambiguous --phred33-quals -p 4 --multicore 20 --genome /mnt/hcs/dsm-pathology-chatterjee-genomics/Euan_Rodger/Ref_Genomes/GRCh37 -1 AAFJNNMM5-8861-04-00-01_S1_L001_R1_001_val_1.fq.gz -2 AAFJNNMM5-8861-04-00-01_S1_L001_R2_001_val_2.fq.gz

Methylation extraction: bismark_methylation_extractor --gzip --bedGraph --comprehensive --s --multicore 10 --merge_non_CpG --cutoff 1 -o ./bedgraph_10_2/ AAFJNNMM5-8861-06-00-01_S3_L001_R1_001_val_1_bismark_bt2_pe.bam

N/B: I repeated this for all the samples at 3X, 5X, and 10X coverage (codes not shown).

Sample 1x coverage 3x coverage 5x coverage 10x coverage
S1 48,841,280 29,043,575 12,581,307 650,604
S2 50,761,706 34,860,543 18,166,369 1,472,475
S3 44,959,784 21,303,364 7,247,631 265,220

I also counted the number of CpGs with methylkit using the bam files obtained from the Bismark alignment. Result:

Sample 1x coverage 3x coverage 5x coverage 10x coverage
S1 49,783,470 31,603,367 16,688,990 1,992,364
S2 52,039,475 37,017,454 22,484,852 3,749,399
S3 45,501,338 26,431,339 13,237,981 1,623,263

Although the second approach seems to give more reasonable results, I still have serious concerns about the accuracy of the CpG count at 1x, 3x, and 5x coverage.

Sorry about the long text; I wanted to provide as much relevant information as possible. Please let me know if you require more information.

I look forward to hearing from you soon.

Thank you.

Kind regards, Mark.

MarkOEze avatar May 05 '24 08:05 MarkOEze