bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Non-random distribution of errors from bonito 0.3.1 data

Open danrdanny opened this issue 3 years ago • 3 comments

Was just looking at some adaptive sampling data from a 9.4.1 pore that I basecalled with bonito 0.3.1 (aligned to hg38 with minimap2) and noticed that the errors appear non-randomly distributed. The error seems lower over the SINE/Alu elements and can be seen in this ~3kb view of chr6:51,993,375-51,996,386.

image

Similar result nearby (chr6:51,984,029-51,987,040):

image

I never saw this with guppy or albacore. Is this expected?

Thanks.

danrdanny avatar Dec 08 '20 06:12 danrdanny

Hey @danrdanny

Thanks for reporting - we will need some time to investigate this in detail.

iiSeymour avatar Dec 09 '20 18:12 iiSeymour

So the effect can be seen with guppy to a lesser extent so I don't think it's totally surprising given the improvement in quality in the latest bonito version.

Screenshot 2020-12-09 at 18 28 27 Screenshot 2020-12-09 at 18 26 28 Screenshot 2020-12-09 at 18 28 27

This is certainly something to consider however w.r.t training set composition which we are currently evaluating.

iiSeymour avatar Dec 10 '20 14:12 iiSeymour

Interesting, I had never noticed that before. Thanks for looking into it.

danrdanny avatar Dec 10 '20 16:12 danrdanny