remora icon indicating copy to clipboard operation
remora copied to clipboard

Potential bug in signal-to-sequence refinement

Open dietvin opened this issue 5 months ago • 1 comments

Hello,

I've been digging into the signal-to-sequence refinement process and noticed a potential bug in how the traceback path is constructed: for some reads steps in the path fall outside the scope defined by the banding logic. I would expect the traceback path to remain entirely within the band defined during DP.

Here’s an example, where the path breaks outside the band:

Image

The data shown here comes from a read in the GIAB dataset you provide. The read in question is 0cb6593a-8cdd-4ea6-93c3-8d6376805a7e, but I’ve observed this in other reads as well.

From what I've seen, this only occurs when using the dwell penalty algorithm. Also if I run dwell penalty but enable rough rescaling beforehand, this behavior doesn't occur and the path stays within the band:

Image Image

This makes me think that the issue lies in on how the traceback array is filled in the banded_forward_dwell_penalty_step function.

I forked Remora and added some rough debugging to the refine_signal_map_core.pyx script, along with scripts to parse the output generated from it. All my steps and code are available here: https://github.com/dietvin/remora_DP_debugging

My environment:

  • Remora 3.3.0
  • Python 3.13.1
  • Ubuntu 24.04.2 LTS

Please let me know if I’m missing something or if you need any additional info to reproduce.

Cheers, Vincent

dietvin avatar Jul 07 '25 09:07 dietvin