bonito
bonito copied to clipboard
Repetitive reads and strandedness
Hi there,
First off, I've never posted an issue before so I apologize if this is the wrong place for this question and will gladly take feedback about where else to send this!
I am running bonito on MinION long reads, where we are trying to sequence long repetitive regions (CGG repeats).
I am using the provided training model:
bonito basecaller dna_r9.4.1 /data/reads > basecalls.fasta
I am noticing that bonito reports vastly different lengths of repeats based on if the read is forward (CGGCGG...) or reverse (CCGCCG). When I use guppy to basecall this data, I don't see this discrepancy at all.
I am wondering if this is something the authors are aware of or might have a suggestion for how to move forward. Is this an issue where I should be training my own model on my own data first?
Thank you for your help!
Hey @linzho
Thanks for raising - we have seen some strandedness internally and are investigating, I will update this issue when we have more details.