Bismark
Bismark copied to clipboard
CpG count query
Hello,
I have some DNA methylation data, which I prepared using EM-Seq and got around 236 million paired-end reads after sequencing on the Illumina NextSeq platform. I have some concerns regarding the number of CpGs I obtained. It is known that there are 28 million CpGs in the human genome; however, the number of CpGs I got, especially at 1X, 3X, and 5x coverage, suggests that the CpG counts may have been overestimated.
I trimmed my reads using Trimgalore, aligned them and extracted the methylation information using Bismark. Please look at the snippets of my codes and the corresponding results below.
Alignment: bismark --bowtie2 --gzip -N 1 --un --ambiguous --phred33-quals -p 4 --multicore 20 --genome /mnt/hcs/dsm-pathology-chatterjee-genomics/Euan_Rodger/Ref_Genomes/GRCh37 -1 AAFJNNMM5-8861-04-00-01_S1_L001_R1_001_val_1.fq.gz -2 AAFJNNMM5-8861-04-00-01_S1_L001_R2_001_val_2.fq.gz
Methylation extraction: bismark_methylation_extractor --gzip --bedGraph --comprehensive --s --multicore 10 --merge_non_CpG --cutoff 1 -o ./bedgraph_10_2/ AAFJNNMM5-8861-06-00-01_S3_L001_R1_001_val_1_bismark_bt2_pe.bam
N/B: I repeated this for all the samples at 3X, 5X, and 10X coverage (codes not shown).
Sample | 1x coverage | 3x coverage | 5x coverage | 10x coverage |
---|---|---|---|---|
S1 | 48,841,280 | 29,043,575 | 12,581,307 | 650,604 |
S2 | 50,761,706 | 34,860,543 | 18,166,369 | 1,472,475 |
S3 | 44,959,784 | 21,303,364 | 7,247,631 | 265,220 |
I also counted the number of CpGs with methylkit using the bam files obtained from the Bismark alignment. Result:
Sample | 1x coverage | 3x coverage | 5x coverage | 10x coverage |
---|---|---|---|---|
S1 | 49,783,470 | 31,603,367 | 16,688,990 | 1,992,364 |
S2 | 52,039,475 | 37,017,454 | 22,484,852 | 3,749,399 |
S3 | 45,501,338 | 26,431,339 | 13,237,981 | 1,623,263 |
Although the second approach seems to give more reasonable results, I still have serious concerns about the accuracy of the CpG count at 1x, 3x, and 5x coverage.
Sorry about the long text; I wanted to provide as much relevant information as possible. Please let me know if you require more information.
I look forward to hearing from you soon.
Thank you.
Kind regards, Mark.