tombo icon indicating copy to clipboard operation
tombo copied to clipboard

.tombo.index file ReadData object order is different than in documentation

Open patbohn opened this issue 3 years ago • 2 comments

Hello, I wanted to investigate the signal-matching-scores of my reads after resquiggling, and if I understand correctly this is most straightforward by looking into the tombo.index files (see quote below). I could open it in python, however it appears that the order of parameters in the tuple is different from the description of the readData class in tombo_helper.py row 127 ff (as linked below). Namely, these are the parameters returned from .tombo.index for one file, which I am guessing (but would like confirmation for) corresponds to the following parameters:

('single_fast5/0/001852d6-3ef8-4d0f-86d1-07a9df70082f.fast5',  #filename
 9786,                                                         #start (?)
 10330,                                                        #read_start_rel_to_raw (?)
 44301,                                                        #end
 'RawGenomeCorrected_000',                                     #corr_group
 'BaseCalled_template',                                        #subgroup
 False,                                                        #filtered
 True,                                                         #rna
 1.3597175839479632,                                           #sig_match_score
 10.411585365853659,                                           #mean_q_score
 '001852d6-3ef8-4d0f-86d1-07a9df70082f')                       #read_id

Is this assignment correct?

Thank you, and best wishes!

If I had to do that project again, I would hack Tombo's .tombo.index files instead. Tombo index files are really just Python pickles. Inside one you'll find a list of Tombo readData objects. Try unpickling an index, deleting some reads, and repickling the file. You can use symlinks to trick Tombo into thinking a directory contains a fast5 base-directory and index-file of your choosing.

Originally posted by @Chris-Kimmel in https://github.com/nanoporetech/tombo/issues/349#issuecomment-860972807

patbohn avatar Jul 01 '21 11:07 patbohn

I'm not sure exactly how this tuple was extracted from the index file so it is hard to say what each attribute is defined. I would point out that RNA being read in the 3' to 5' direction can flip the order of some parameters (so an end might be smaller than the start). This is unfortunately not well documented.

We are working on tombo2 at the moment which should have more clear documentation and handling for RNA data.

marcus1487 avatar Jul 01 '21 13:07 marcus1487

Ah, thank you for the quick response. I extracted the tuple by reading in the tombo.index dictionary pickle, then got the values using my reference_name key, and there took the first tuple in the list of tuples.

import pickle
with open(tombo_index_file, "rb") as infile:
        tombo_index = pickle.load(infile)

tombo_index[('reference_name', '+')][0]

Very interested in tombo2 and excited to try it as soon as it releases!

patbohn avatar Jul 01 '21 13:07 patbohn