tombo
tombo copied to clipboard
.tombo.index file ReadData object order is different than in documentation
Hello, I wanted to investigate the signal-matching-scores of my reads after resquiggling, and if I understand correctly this is most straightforward by looking into the tombo.index files (see quote below). I could open it in python, however it appears that the order of parameters in the tuple is different from the description of the readData class in tombo_helper.py row 127 ff (as linked below). Namely, these are the parameters returned from .tombo.index for one file, which I am guessing (but would like confirmation for) corresponds to the following parameters:
('single_fast5/0/001852d6-3ef8-4d0f-86d1-07a9df70082f.fast5', #filename
9786, #start (?)
10330, #read_start_rel_to_raw (?)
44301, #end
'RawGenomeCorrected_000', #corr_group
'BaseCalled_template', #subgroup
False, #filtered
True, #rna
1.3597175839479632, #sig_match_score
10.411585365853659, #mean_q_score
'001852d6-3ef8-4d0f-86d1-07a9df70082f') #read_id
Is this assignment correct?
Thank you, and best wishes!
If I had to do that project again, I would hack Tombo's
.tombo.index
files instead. Tombo index files are really just Python pickles. Inside one you'll find a list of TomboreadData
objects. Try unpickling an index, deleting some reads, and repickling the file. You can use symlinks to trick Tombo into thinking a directory contains a fast5 base-directory and index-file of your choosing.
Originally posted by @Chris-Kimmel in https://github.com/nanoporetech/tombo/issues/349#issuecomment-860972807
I'm not sure exactly how this tuple was extracted from the index file so it is hard to say what each attribute is defined. I would point out that RNA being read in the 3' to 5' direction can flip the order of some parameters (so an end might be smaller than the start). This is unfortunately not well documented.
We are working on tombo2 at the moment which should have more clear documentation and handling for RNA data.
Ah, thank you for the quick response. I extracted the tuple by reading in the tombo.index dictionary pickle, then got the values using my reference_name key, and there took the first tuple in the list of tuples.
import pickle
with open(tombo_index_file, "rb") as infile:
tombo_index = pickle.load(infile)
tombo_index[('reference_name', '+')][0]
Very interested in tombo2 and excited to try it as soon as it releases!