STRique
STRique copied to clipboard
STRique model version R10.3
I found in an earlier issue that R10.3 would not work with the current version of STRique: https://github.com/giesselmann/STRique/issues/6#issuecomment-600324010_
I have a couple of questions and would appreciate your feedback:
- Are there any updates regarding the model version?
- We couldn't find in the STRique documentation what are the contents of the model file, such as this one:
AAAAAA 87.31411831337803 0.7271229290351257 2556
AAAAAC 83.7620420260019 1.0166215079284922 3802
AAAAAG 84.87997176980885 0.6816026090898406 1660
What do the four columns refer to?
- Is this an issue that can be solved by creating a model file for R10.3?
Hi, There's no development from our side regarding R10 support at the moment. ONT ist no longer providing kmer models (https://github.com/nanoporetech/kmer_models), one would therefore need to derive one from R10 data and I'm not sure how to do that. The columns of the model file are kmer, mean, std, and number of observations during training.
I think STRique could work with R10, but in addition to a new model file, one would need to resequence the synthetic repeat samples to prove the accuracy again and perhaps tune the HMM parameters (config/STRique.json).
Pay
Hi Pay, thank you for your prompt response.
Would you know alternatives for STR identification on Nanopore data that work with R10?
I would check how good basecalling has become and try a sequence-based classification, either RepeatHMM or a simple decoy alignment. With R9, we've seen strand-specific errors, if you plot R10 results per strand and do not see a bias, it's of course no guarantee but a good indicator that the counting works in sequence space.