bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Repetitive reads and strandedness

Open linzho opened this issue 3 years ago • 1 comments

Hi there,

First off, I've never posted an issue before so I apologize if this is the wrong place for this question and will gladly take feedback about where else to send this!

I am running bonito on MinION long reads, where we are trying to sequence long repetitive regions (CGG repeats).

I am using the provided training model:

bonito basecaller dna_r9.4.1 /data/reads > basecalls.fasta

I am noticing that bonito reports vastly different lengths of repeats based on if the read is forward (CGGCGG...) or reverse (CCGCCG). When I use guppy to basecall this data, I don't see this discrepancy at all.

I am wondering if this is something the authors are aware of or might have a suggestion for how to move forward. Is this an issue where I should be training my own model on my own data first?

Thank you for your help!

linzho avatar Dec 23 '20 17:12 linzho

Hey @linzho

Thanks for raising - we have seen some strandedness internally and are investigating, I will update this issue when we have more details.

iiSeymour avatar Jan 25 '21 19:01 iiSeymour