RLM icon indicating copy to clipboard operation
RLM copied to clipboard

Feature: mismatch ignore

Open Apompetti-Cori opened this issue 1 month ago • 3 comments

Hello is it possible to add an option to ignore mismatches at CpG positions and just take whatever evidence there is for methylation at the position?

Apompetti-Cori avatar Nov 17 '25 21:11 Apompetti-Cori

Hi - can you elaborate a bit why you would like to have this feature and how exactly you would like the calculation to work in comparison to the current implementation? I would like to get a better feeling for what you are looking for and in what ways specifically the implementation can be improved. Please also let me know for which scores this is particularly important for you.

Best, Sara

sarahet avatar Nov 24 '25 09:11 sarahet

I would like to ignore filtering mismatches at CpG positions to compare the reads to output supplied by bismark_methylation_extractor. More specifically I notice that bismark doesn't seem to filter reads with these mismatches so I'm curious if it's possible to make the filtering optional. I'm particularly interested in the single_read mode.

Apompetti-Cori avatar Nov 24 '25 21:11 Apompetti-Cori

I think that is doable. The reason why we excluded them mainly is that especially for scores like entropy, it harms the computation (we cannot compute entropy if out of 4 consecutive CpGs, 1 is a mismatch, because it is not comparable anymore to other samples that did not have a mismatch there). And because we wanted the scores to be based on the same input data, we excluded them for everything. But we could add a mode where this is not immediately removed but decided for the scores on a case-by-case basis and the single read mode could output all reads. This will probably take a bit though but I will see if we can manage that in the next weeks. We could then calculate methylation of that read across all CpGs that are present and in the column where we output the methylation pattern, e.g., gGGG we could flag a mismatch by a special character such as gGGX if the last CpG is a mismatch. Would that meet your request or is there anything else that is important to your application/use case?

sarahet avatar Nov 25 '25 09:11 sarahet

Thanks for the detailed explanation! I think the most simple (in theory) request would just be to output all reads in single_read mode so that I can have control over whatever happens downstream. Thank you for your help with this, RLM has been a very useful tool.

Apompetti-Cori avatar Dec 01 '25 15:12 Apompetti-Cori