Bismark
Bismark copied to clipboard
Bismark silently outputs incorrect results when UMIs are added using Illuminas bcl-convert
This is not really a bug as the documentation clearly states how deduplicate_bismark expects UMIs to be handled, but it is an easy mistake to make. As documented in deduplicate_bismark, Bismark expects UMIs of the form: @A00001:001:HN2F7DRX1:1:1101:1452:1000 1:N:0:AATGACGC:CAAGAG But if Illuminas bcl-convert is used with OverrideCycles to handle UMIs, the read ID looks like this @A00001:001:HN2F7DRX1:1:1101:1452:1000:CAAGAG 1:N:0:AATGACGC The UMI is highlighted in bold. This means the sample index is used as a UMI, and no warning or error is emitted.
I propose running a pre-flight check to detect this scenario, and potentially to support the UMI location chosen by Illumina.
EDIT: I might have been completely off. I'll close it for now.