Bismark
Bismark copied to clipboard
Unexpected combination of read and genome conversion: /
Hi,
Thanks for providing such a useful tool for DNA methylation analysis. Recently, I used the bismark_methylation_extractor to extract methylation information from a sorted bam file (which was generated by samtools sort -n):
~/Bismark-0.23.1/bismark_methylation_extractor -p --gzip --bedGraph e14_wt_chr19.sorted.bam
However, an error occured:
So could you please tell me how to solve this problem?
Maybe a naive question, but was your BAM file aligned with Bismark at all? I am only asking because I cannot see a Bismark mapping line in the SAM header (starting with @PG
). I am afraid this would be required as the first step.
Maybe a naive question, but was your BAM file aligned with Bismark at all? I am only asking because I cannot see a Bismark mapping line in the SAM header (starting with
@PG
). I am afraid this would be required as the first step.
Thanks for your quick reply. The SAM header is displayed below and no @PG existed. Thus I start with fastq files......
Hmm, what do you mean you start with FastQ files? You will first need to take the data, trim it appropriately (e.g. using Trim Galore), and then align the trimmed FastQ file to the genome (as an aside, is there a specific reason why you are using a 12 year old mouse genome and not the current one?)
Can you also show the first few sequence lines of the BAM files, e.g. using:
samtools view e14_wt_chr19.sorted.bam | head -4
Sorry for this misleading information. Yeah, I decided to perform the alignment using Bismark (bowtie2) since the bam file I previously used was generated from bwa. The age of sample I used was 14 days. Besides, the first 6 lines of the sorted bam files were displayed below.
Hmm, I am afraid this file either isn't a Bismark file, or someone or some process has interfered with the file so much that it has become unusable. A typical Bismark paired-end BAM entry would look like this:
NB501547:236:H5GJWAFXY:1:11101:2482:1126_1:N:0:0_AAGAGGCA_AAGGAGTA 99 16 48219108 12 76M = 48219273 241 ACATGGTGAAGTGTGTCCAGCTGGCTGGAAACCTGGCAGTGATACCATCAAG
CCTGATGTCAATAAGAGCAAAGAG AAAAAEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEAE NM:i:2 MD:Z:15C18A41 XM:Z:...z..z....z.z.......z...z.......z.....z..z....z......z..z.z...
z............ XR:Z:CT XG:Z:CT
NB501547:236:H5GJWAFXY:1:11101:2482:1126_1:N:0:0_AAGAGGCA_AAGGAGTA 147 16 48219273 12 76M = 48219108 -241 CATGATGTGGTGTGATTCCAGATAAGCCTTTCCTACAGGGCTGGGGATGGAT
AGCCTTTCTTCCACTATTGGTAAT EEEEAAE/EE6EEEEE/EEEEEEEEEEEEEEEEEEEEEEEE/EEEEEAEEEEEEEEEEAEEEEEEEEEAEEAAAAA NM:i:0 MD:Z:76 XM:Z:..z..z.z..z.z..zz.....z.....zzz..z.......z.....z...z....zzz.zz....z.zz.
.z..z XR:Z:GA XG:Z:CT
Where it is absolutely essential that you have the tags XR:Z:GA
and XG:Z:
to determine the read and genome conversion state, as well as the XM:Z:
field to extract the methylation data. Also, your data contains AS
, MQ
and a few other flags, which leads me to believe that we are not looking at Bismark data here in the first place...