STRique icon indicating copy to clipboard operation
STRique copied to clipboard

STRique model version R10.3

Open yufernando opened this issue 3 years ago • 3 comments

I found in an earlier issue that R10.3 would not work with the current version of STRique: https://github.com/giesselmann/STRique/issues/6#issuecomment-600324010_

I have a couple of questions and would appreciate your feedback:

  • Are there any updates regarding the model version?
  • We couldn't find in the STRique documentation what are the contents of the model file, such as this one:
AAAAAA	87.31411831337803	0.7271229290351257	2556
AAAAAC	83.7620420260019	1.0166215079284922	3802
AAAAAG	84.87997176980885	0.6816026090898406	1660

What do the four columns refer to?

  • Is this an issue that can be solved by creating a model file for R10.3?

yufernando avatar Jun 24 '21 20:06 yufernando

Hi, There's no development from our side regarding R10 support at the moment. ONT ist no longer providing kmer models (https://github.com/nanoporetech/kmer_models), one would therefore need to derive one from R10 data and I'm not sure how to do that. The columns of the model file are kmer, mean, std, and number of observations during training.

I think STRique could work with R10, but in addition to a new model file, one would need to resequence the synthetic repeat samples to prove the accuracy again and perhaps tune the HMM parameters (config/STRique.json).

Pay

giesselmann avatar Jun 25 '21 08:06 giesselmann

Hi Pay, thank you for your prompt response.

Would you know alternatives for STR identification on Nanopore data that work with R10?

yufernando avatar Jun 25 '21 13:06 yufernando

I would check how good basecalling has become and try a sequence-based classification, either RepeatHMM or a simple decoy alignment. With R9, we've seen strand-specific errors, if you plot R10 results per strand and do not see a bias, it's of course no guarantee but a good indicator that the counting works in sequence space.

giesselmann avatar Jun 25 '21 13:06 giesselmann