tombo
tombo copied to clipboard
Documentation for tombo build_model estimate_alt_reference could be clearer
Hello, I have been trying to train an alternative model to get more accurate RNA mod detection at higher modification densities, and was trying to run
tombo build_model estimate_alt_reference ...
I noticed that when supplying --rna
I received an error:
File "/home/patrick/anaconda3/envs/ont-tools/lib/python3.7/site-packages/tombo/tombo_stats.py", line 800, in __init__
seq_samp_type.name]))
AttributeError: 'str' object has no attribute 'name'
Without the option, the autodetection of RNA works however, so not a big issue.
However, for a proof-of-concept, I was trying to train the model on a specific reference sequence that does not contain all 1024 5-mers, and it seems that this is not possible, as a minimal-kmer-observations
of 0 leads to FloatingPointError: invalid value encountered in true_divide
. If that is correct, it seems that I require RNA sequences that contain all possible 5-mers to continue, right?
Thanks in advance!
The --rna
option certainly looks like a bug. I'll have a look at that.
For the model training, yes Tombo needs to see the modified base in call k-mer contexts (in all relative positions) in order to train a modified base model. That should be explicitly set at the option stage to avoid this error. The other computational option here is to reduce the k-mer size. This does require a canonical model as well, but this could be derived from the larger 5-mer model provided with Tombo. I hope this helps in your research.