Potential bug in signal-to-sequence refinement
Hello,
I've been digging into the signal-to-sequence refinement process and noticed a potential bug in how the traceback path is constructed: for some reads steps in the path fall outside the scope defined by the banding logic. I would expect the traceback path to remain entirely within the band defined during DP.
Here’s an example, where the path breaks outside the band:
The data shown here comes from a read in the GIAB dataset you provide. The read in question is 0cb6593a-8cdd-4ea6-93c3-8d6376805a7e, but I’ve observed this in other reads as well.
From what I've seen, this only occurs when using the dwell penalty algorithm. Also if I run dwell penalty but enable rough rescaling beforehand, this behavior doesn't occur and the path stays within the band:
This makes me think that the issue lies in on how the traceback array is filled in the banded_forward_dwell_penalty_step function.
I forked Remora and added some rough debugging to the refine_signal_map_core.pyx script, along with scripts to parse the output generated from it. All my steps and code are available here: https://github.com/dietvin/remora_DP_debugging
My environment:
- Remora 3.3.0
- Python 3.13.1
- Ubuntu 24.04.2 LTS
Please let me know if I’m missing something or if you need any additional info to reproduce.
Cheers, Vincent