Bismark icon indicating copy to clipboard operation
Bismark copied to clipboard

Methylation extractor

Open keroguynes opened this issue 2 years ago • 5 comments

Dear @FelixKrueger,

I have been using this pipeline for a while now but have not seen the output below when using methylation extractor. I am pretty sure that I have paired-end data but I am not sure why it is treating files as paired-end and single-end. I didn't do anything differently this time around.

Output will be written into the directory: /mydirectory/
Trying to determine the type of mapping from the SAM header line of file mydeduplicated.bam
Treating file(s) as paired-end data (as extracted from @PG line)

Treating file(s) as single-end data (as extracted from @PG line)

Treating file(s) as single-end data (as extracted from @PG line)

Treating file(s) as single-end data (as extracted from @PG line)

Treating file(s) as single-end data (as extracted from @PG line)

Treating file(s) as single-end data (as extracted from @PG line)

Treating file(s) as single-end data (as extracted from @PG line)

Also, is it normal to skip SAM header line? I apologise for these rudimentary question. Many thanks in advance for your help.

keroguynes avatar Oct 31 '22 15:10 keroguynes

This should have been addressed a while back (https://github.com/FelixKrueger/Bismark/releases/tag/0.23.0), can you try to update to the latest version (0.24.0) and see if the issue has gone?

Cheers, Felix

FelixKrueger avatar Oct 31 '22 21:10 FelixKrueger

Dear @FelixKrueger,

Apologies for the delayed response -- I have been working with the new release as you've suggested. I apologise ahead of time for the multiple questions I am going to ask you pertaining to this.

Also, I should have specified in the inital post that I have used the v0.22 release for all my samples to date without a hitch until I ran the pipeline on merged paired-end fastq files -- that is when I noticed the error I enclosed above. I would have liked to keep the same release/version to call methylation on all the samples so if you've got a clue on how to fix it then I am all ears.

I am currently using the latest release (v0.24) to analyse the same merged paired-end data and I noticed two following things and I am not sure if I should be concerned:

a) For alignment, I got the following error on the output

Failed to close filehandle AMBIG_1: Bad file descriptor at /data/bin/bismark line 2641, <IN2> line 34189156.
Failed to close filehandle AMBIG_2: Bad file descriptor at /data/bin/bismark line 2642, <IN2> line 34189156.
Failed to close filehandle UNMAPPED_1: Bad file descriptor at /data/bin/bismark line 2643, <IN2> line 34189156.
Failed to close filehandle UNMAPPED_2: Bad file descriptor at /data/bin/bismark line 2644, <IN2> line 34189156.

and for another sample, I also got the same error in addition to lots of the following for almost all the scaffolds:

Chromosomal sequence could not be extracted for A00551:521:H3NWTDSX5:4:2171:18367:19586_1:N:0:TGCGTAAC+GATAGGCT scaffold_9942      1
Chromosomal sequence could not be extracted for A00551:521:H3NWTDSX5:4:2355:1470:17628_1:N:0:TGCGTAAC+GATAGGCT  scaffold_8716      4812
Chromosomal sequence could not be extracted for A00551:521:H3NWTDSX5:4:2171:18747:19648_1:N:0:TGCGTAAC+GATAGGCT scaffold_9942      1
Chromosomal sequence could not be extracted for A00551:521:H3NWTDSX5:4:2651:21856:20932_1:N:0:TGCGTAAC+GATAGGCT scaffold_272       284286

I got the Chromosomal sequence could not be extracted for warning in the v0.22 release as well but I didn't fix it and continued with the analysis. But now I am wondering if I should be concerned? Will it affect my prior analyses? I apologise for the terribly long message.

keroguynes avatar Nov 24 '22 16:11 keroguynes

I don't think that anything has changed for the extraction as such, it really only fixed the single-end/paired-end detection. I wouldn't recommend that you should reprocess all of your previous data (unless it has been processed the wrong way...).

To a) I think this is just a warning message about closing file handles, you should just ignore the message (did you even report amibiguous or unmapped reads?)

To b) sometimes when reads cover the extreme edge of a chromosome (or contig), you may see these warning messages. Again, this is nothing to worry about, it typically only affects a negligible fraction of reads. (The error is there because if an alignment goes until the very edge of a chromosome, Bismark cannot extract a further 2bp downstream to determine the sequence context)).

FelixKrueger avatar Nov 24 '22 17:11 FelixKrueger

Thank you for quick response and explanation for (a) and (b) - they make perfect sense! This release did indeed fix the single/paired-end detection.

However, I was wondering if I were to continue using the v0.22 release to ensure I process all the samples the same way, how do I deal with the paired-end data that I essentially need to merge for more coverage? First, I tried merging them as fastq files and then running the pipeline and also running the pipeline on the reads separately and then merging them after the deduplication step -- both resulted in the single/paired-end detection error. How could I have done this differently? If there isn't a way, I will stick with the v0.24 release for just these samples.

Re: ambiguous or unmapped reads - I did not specify this. I used the following command: bismark --genome_folder path/genome -1 merged_R1.fastq.gz -2 merged_R2.fastq.gz --bowtie2 -multicore 8

I apologise if you answered this elsewhere but is the warning skipping header line: @SQ SN:Scaffold06 LN:26046836 negligible?

Thank you so much for your time and help in advance!

keroguynes avatar Nov 24 '22 18:11 keroguynes

I suppose you could bypass the autodetection by just specifying -p (to tell it explicitly that the data is paired-end).

Thanks for the clarification about the command, I might take a look why these filehandles are being touched in the first place....

And yes, the methylation extractor is skipping header lines as they don't contain any methylation information - all good!

FelixKrueger avatar Nov 24 '22 22:11 FelixKrueger

Hi Felix, I've managed to fix this issue - adding -p did the trick! Many thanks for your help. I will now close this query.

keroguynes avatar Mar 22 '23 14:03 keroguynes