bonito
bonito copied to clipboard
Non-random distribution of errors from bonito 0.3.1 data
Was just looking at some adaptive sampling data from a 9.4.1 pore that I basecalled with bonito 0.3.1 (aligned to hg38 with minimap2) and noticed that the errors appear non-randomly distributed. The error seems lower over the SINE/Alu elements and can be seen in this ~3kb view of chr6:51,993,375-51,996,386.
![image](https://user-images.githubusercontent.com/3186059/101447180-8bd54280-38d9-11eb-8531-6fff4e89a906.png)
Similar result nearby (chr6:51,984,029-51,987,040):
![image](https://user-images.githubusercontent.com/3186059/101447357-ecfd1600-38d9-11eb-82c9-f42c5df3d410.png)
I never saw this with guppy or albacore. Is this expected?
Thanks.
Hey @danrdanny
Thanks for reporting - we will need some time to investigate this in detail.
So the effect can be seen with guppy to a lesser extent so I don't think it's totally surprising given the improvement in quality in the latest bonito version.
This is certainly something to consider however w.r.t training set composition which we are currently evaluating.
Interesting, I had never noticed that before. Thanks for looking into it.