Bismark icon indicating copy to clipboard operation
Bismark copied to clipboard

Unexpected combination of read and genome conversion: /

Open ccmeth opened this issue 2 years ago • 5 comments

Hi,

Thanks for providing such a useful tool for DNA methylation analysis. Recently, I used the bismark_methylation_extractor to extract methylation information from a sorted bam file (which was generated by samtools sort -n):
~/Bismark-0.23.1/bismark_methylation_extractor -p --gzip --bedGraph e14_wt_chr19.sorted.bam

However, an error occured: image

So could you please tell me how to solve this problem?

ccmeth avatar May 11 '22 10:05 ccmeth

Maybe a naive question, but was your BAM file aligned with Bismark at all? I am only asking because I cannot see a Bismark mapping line in the SAM header (starting with @PG). I am afraid this would be required as the first step.

FelixKrueger avatar May 11 '22 11:05 FelixKrueger

Maybe a naive question, but was your BAM file aligned with Bismark at all? I am only asking because I cannot see a Bismark mapping line in the SAM header (starting with @PG). I am afraid this would be required as the first step.

Thanks for your quick reply. The SAM header is displayed below and no @PG existed. Thus I start with fastq files...... image

ccmeth avatar May 11 '22 11:05 ccmeth

Hmm, what do you mean you start with FastQ files? You will first need to take the data, trim it appropriately (e.g. using Trim Galore), and then align the trimmed FastQ file to the genome (as an aside, is there a specific reason why you are using a 12 year old mouse genome and not the current one?)

Can you also show the first few sequence lines of the BAM files, e.g. using:

samtools view e14_wt_chr19.sorted.bam | head -4

FelixKrueger avatar May 11 '22 11:05 FelixKrueger

Sorry for this misleading information. Yeah, I decided to perform the alignment using Bismark (bowtie2) since the bam file I previously used was generated from bwa. The age of sample I used was 14 days. Besides, the first 6 lines of the sorted bam files were displayed below.

image

ccmeth avatar May 11 '22 12:05 ccmeth

Hmm, I am afraid this file either isn't a Bismark file, or someone or some process has interfered with the file so much that it has become unusable. A typical Bismark paired-end BAM entry would look like this:

NB501547:236:H5GJWAFXY:1:11101:2482:1126_1:N:0:0_AAGAGGCA_AAGGAGTA	99	16	48219108	12	76M	=	48219273	241	ACATGGTGAAGTGTGTCCAGCTGGCTGGAAACCTGGCAGTGATACCATCAAG
CCTGATGTCAATAAGAGCAAAGAG	AAAAAEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEAE	NM:i:2	MD:Z:15C18A41	XM:Z:...z..z....z.z.......z...z.......z.....z..z....z......z..z.z...
z............	XR:Z:CT	XG:Z:CT
NB501547:236:H5GJWAFXY:1:11101:2482:1126_1:N:0:0_AAGAGGCA_AAGGAGTA	147	16	48219273	12	76M	=	48219108	-241	CATGATGTGGTGTGATTCCAGATAAGCCTTTCCTACAGGGCTGGGGATGGAT
AGCCTTTCTTCCACTATTGGTAAT	EEEEAAE/EE6EEEEE/EEEEEEEEEEEEEEEEEEEEEEEE/EEEEEAEEEEEEEEEEAEEEEEEEEEAEEAAAAA	NM:i:0	MD:Z:76	XM:Z:..z..z.z..z.z..zz.....z.....zzz..z.......z.....z...z....zzz.zz....z.zz.
.z..z	XR:Z:GA	XG:Z:CT

Where it is absolutely essential that you have the tags XR:Z:GA and XG:Z: to determine the read and genome conversion state, as well as the XM:Z: field to extract the methylation data. Also, your data contains AS, MQ and a few other flags, which leads me to believe that we are not looking at Bismark data here in the first place...

FelixKrueger avatar May 11 '22 13:05 FelixKrueger