tombo icon indicating copy to clipboard operation
tombo copied to clipboard

Documentation for tombo build_model estimate_alt_reference could be clearer

Open patbohn opened this issue 3 years ago • 1 comments

Hello, I have been trying to train an alternative model to get more accurate RNA mod detection at higher modification densities, and was trying to run

tombo build_model estimate_alt_reference ...

I noticed that when supplying --rna I received an error:

File "/home/patrick/anaconda3/envs/ont-tools/lib/python3.7/site-packages/tombo/tombo_stats.py", line 800, in __init__
    seq_samp_type.name]))
AttributeError: 'str' object has no attribute 'name'

Without the option, the autodetection of RNA works however, so not a big issue.

However, for a proof-of-concept, I was trying to train the model on a specific reference sequence that does not contain all 1024 5-mers, and it seems that this is not possible, as a minimal-kmer-observations of 0 leads to FloatingPointError: invalid value encountered in true_divide. If that is correct, it seems that I require RNA sequences that contain all possible 5-mers to continue, right?

Thanks in advance!

patbohn avatar Nov 13 '20 14:11 patbohn

The --rna option certainly looks like a bug. I'll have a look at that.

For the model training, yes Tombo needs to see the modified base in call k-mer contexts (in all relative positions) in order to train a modified base model. That should be explicitly set at the option stage to avoid this error. The other computational option here is to reduce the k-mer size. This does require a canonical model as well, but this could be derived from the larger 5-mer model provided with Tombo. I hope this helps in your research.

marcus1487 avatar Nov 23 '20 14:11 marcus1487