Feature: mismatch ignore
Hello is it possible to add an option to ignore mismatches at CpG positions and just take whatever evidence there is for methylation at the position?
Hi - can you elaborate a bit why you would like to have this feature and how exactly you would like the calculation to work in comparison to the current implementation? I would like to get a better feeling for what you are looking for and in what ways specifically the implementation can be improved. Please also let me know for which scores this is particularly important for you.
Best, Sara
I would like to ignore filtering mismatches at CpG positions to compare the reads to output supplied by bismark_methylation_extractor. More specifically I notice that bismark doesn't seem to filter reads with these mismatches so I'm curious if it's possible to make the filtering optional. I'm particularly interested in the single_read mode.
I think that is doable. The reason why we excluded them mainly is that especially for scores like entropy, it harms the computation (we cannot compute entropy if out of 4 consecutive CpGs, 1 is a mismatch, because it is not comparable anymore to other samples that did not have a mismatch there). And because we wanted the scores to be based on the same input data, we excluded them for everything. But we could add a mode where this is not immediately removed but decided for the scores on a case-by-case basis and the single read mode could output all reads. This will probably take a bit though but I will see if we can manage that in the next weeks. We could then calculate methylation of that read across all CpGs that are present and in the column where we output the methylation pattern, e.g., gGGG we could flag a mismatch by a special character such as gGGX if the last CpG is a mismatch. Would that meet your request or is there anything else that is important to your application/use case?
Thanks for the detailed explanation! I think the most simple (in theory) request would just be to output all reads in single_read mode so that I can have control over whatever happens downstream. Thank you for your help with this, RLM has been a very useful tool.