remora
remora copied to clipboard
```get_io_reads()``` fails with ```Move table discordant with signal``` on some reads
Hi @marcus1487,
I am running into an issue that I was hoping you could provide some insight into:
I have a dataset from a P2 sequencer that was basecalled and aligned using dorado
v0.5.1. When working with these data using remora
v3.1.0, a small portion of reads (about 1 in 5000 reads) fail when calling get_io_reads
with the following error
import pysam
import pod5
from remora import io
# prepare the sequence data
print("Loading Pod5 and BAM files")
can_pod5_fh = pod5.Reader("read.pod5")
can_bam_fh = pysam.AlignmentFile("read.bam")
bam_read = [ read for read in can_bam_fh.fetch() if io.read_is_primary(read)][0]
io_read = io.get_io_reads([bam_read], can_pod5_fh, reverse_signal=False)[0]
RemoraError Traceback (most recent call last)
File [...]/remora/io.py:756, in get_io_reads(bam_reads, pod5_dr, reverse_signal, missing_ok)
755 try:
--> 756 io_read = Read.from_pod5_and_alignment(
757 pod5_read_record=pod5_reads[get_parent_id(bam_read)],
758 alignment_record=bam_read,
759 reverse_signal=reverse_signal,
760 )
761 except Exception:
File [...]/remora/io.py:1992, in Read.from_pod5_and_alignment(cls, pod5_read_record, alignment_record, reverse_signal)
1986 read = Read(
1987 read_id=str(pod5_read_record.read_id),
1988 dacs=dacs,
1989 shift_dacs_to_pa=pod5_read_record.calibration.offset,
1990 scale_dacs_to_pa=pod5_read_record.calibration.scale,
1991 )
-> 1992 read.add_alignment(alignment_record, reverse_signal=reverse_signal)
1993 return read
File [...]/remora/io.py:1912, in Read.add_alignment(self, alignment_record, parse_ref_align, reverse_signal)
1911 try:
-> 1912 self.query_to_signal, self.mv_table, self.stride = parse_move_tag(
1913 tags["mv"],
1914 sig_len=self.dacs.size,
1915 seq_len=len(self.seq),
1916 reverse_signal=reverse_signal,
1917 )
1918 except KeyError:
File [...]/remora/io.py:401, in parse_move_tag(mv_tag, sig_len, seq_len, check, reverse_signal)
397 LOGGER.debug(
398 f"Move table (len: {mv_table.size}) discordant with "
399 f"signal (sig len // stride: {sig_len // stride})"
400 )
--> 401 raise RemoraError("Move table discordant with signal")
402 return query_to_signal, mv_table, stride
RemoraError: Move table discordant with signal
During handling of the above exception, another exception occurred:
RemoraError Traceback (most recent call last)
Cell In[22], line 11
8 can_bam_fh = pysam.AlignmentFile("read.bam")
10 bam_read = [ read for read in can_bam_fh.fetch() if io.read_is_primary(read)][0]
---> 11 io_read = io.get_io_reads([bam_read], can_pod5_fh, reverse_signal=False)[0]
File [...]/remora/io.py:765, in get_io_reads(bam_reads, pod5_dr, reverse_signal, missing_ok)
763 continue
764 else:
--> 765 raise RemoraError("BAM record not found in POD5")
766 io_reads.append(io_read)
767 return io_reads
RemoraError: BAM record not found in POD5
I have attached the corresponding read to replicate the issue. I am curious as to what could cause these cases and if this is anything that can be fixed from a programmatic standpoint or whether these reads should simply be filtered out.
In the the only other issue raising this problem you hinted this could be related to read splitting. In this instance however, the read does not stem from a split read:
bam_read.get_tag("pi")
KeyError: "tag 'pi' not present"
Any insight into this matter would be greatly appreciated.
Thank you in advance!
Hi. I also played a lot with move tables. Check if you don't have duplicated read IDs in your pod5 files that you use for basecalling. or basecalled bam They may cause this kind of issues according to my experience
I can reproduce the bug with the BAM file you sent, but I'm not able to replicate the issue with dorado 0.5.3 and without reference mapping. It is possible that the issues is due to a bug in dorado concerning adapter trimming (related to the splitting bug). Could you upgrade to dorado 0.5.3 and let me know if this resolves your issue?
Hi marcus, I'm getting a similar issue with the newest dorado version (0.6.1) while trying to run remora analyze plot ref_region
as written in the readme:
remora analyze plot ref_region --pod5-and-bam can_pod5/barcode02.pod5 can_pod5/calls_fast.bam --pod5-and-bam mod_pod5/barcode02.pod5 mod_pod5/calls_fast.bam --ref-regions analyze_region_signal.bed --refine-kmer-level-table /home/v313/kmer_models/dna_r10.4.1_e8.2_400bps/9mer_levels_v1.txt --refine-rough-rescale --log-filename log.txt
Getting a similar output:
Traceback (most recent call last):
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 756, in get_io_reads
io_read = Read.from_pod5_and_alignment(
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 1992, in from_pod5_and_alignment
read.add_alignment(alignment_record, reverse_signal=reverse_signal)
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 1912, in add_alignment
self.query_to_signal, self.mv_table, self.stride = parse_move_tag(
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 401, in parse_move_tag
raise RemoraError("Move table discordant with signal")
remora.RemoraError: Move table discordant with signal
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/v313/.local/bin/remora", line 8, in <module>
sys.exit(run())
File "/home/v313/.local/lib/python3.8/site-packages/remora/main.py", line 71, in run
cmd_func(args)
File "/home/v313/.local/lib/python3.8/site-packages/remora/parsers.py", line 1874, in run_plot_ref_region
io.plot_signal_at_ref_region(
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 1521, in plot_signal_at_ref_region
samples_read_ref_regs, reg_bam_reads = get_reads_reference_regions(
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 787, in get_reads_reference_regions
io_reads = get_io_reads(sample_bam_reads, pod5_dr, reverse_signal)
File "/home/v313/.local/lib/python3.8/site-packages/remora/io.py", line 765, in get_io_reads
raise RemoraError("BAM record not found in POD5")
remora.RemoraError: BAM record not found in POD5
I'm unsure about the first error. However I did check the bams and pod5s and it does seem like there are some read ids that are not present in the pod5 and vice versa, unsure how this happens. Trying this with unmapped reads gives me the same outcome. It might be a read splitting issue, however I'm not able to resolve it.
This should be resolved with the v3.2 release yesterday. Please reopen this issue if you have further issues.
@marcus1487 Could you please link to the changes in the code for this issue and if you have the time, elaborate on what caused this and how it was solved? I have been trying to debug multiple instances of this issue over time and I am curious what changes were made to the logic.
Thanks!
The changes are buried in this rather large merge request. Specifically note the changes in io.py regarding the ts, sp, and ns tags.
https://github.com/nanoporetech/remora/commit/db7ff94ae1de83f8a71add421db87692abbffdff
I can confirm that this issue is solved with v3.2. On a dataset of 60000 reads mapping onto a plasmid, prior versions failed on 50000 of them. Now, all are processed correctly. Thank you very much for this fix @marcus1487 !