bonito icon indicating copy to clipboard operation
bonito copied to clipboard

kbeam code not available

Open JosephLalli opened this issue 3 years ago • 2 comments

I'm working on a custom way to decode sequences using the v3 model. Parts of my sequence are degenerate bases, and I'd like to use the bases on either side of the degenerate bases to inform the basecalling of the degenerate bases. (e.g., for sequence ATTCGNATGAA, P(N|scores, ATTCG upstream, ATGAA downstream). There are methods of doing this kind of analysis using custom beam search algorithms.

The decoding algorithm for the v2 model ("CTC" model) is fairly straightforward, however from what I can tell it seems like the v3 model ("CTC-CRF") model produces scores for 5mers. I've never seen a beamsearch across 5mer predictions, and I'm not sure how to go about it.

I have specific questions I could ask, but it would be very nice if the code were available, or at least a description of the algorithm. The module you use, kbeam, is not available on github (as far as I can tell). Would it be possible to send a link to the code, or a detailed description of the algo? Or is kbeam proprietary software?

JosephLalli avatar Feb 09 '21 16:02 JosephLalli

Would it be reasonable to look to this paper for guidance?

https://arxiv.org/pdf/1910.11555.pdf

JosephLalli avatar Feb 09 '21 16:02 JosephLalli

Hey @JosephLalli

The intention is to get kbeam on GitHub soon - we are just polishing it up.

iiSeymour avatar Feb 10 '21 11:02 iiSeymour