bonito
bonito copied to clipboard
Lowered barcode recognition of bonito basecalled data with Bonito 0.4.0
Dear @iiSeymour
I do experience a similar issue as in issue 26 earlier. After training a new model using the Bonito 0.4.0 software the demultiplexing (qcat) command results in only 40-70% of the reads being addressed to the correct barcode. I am using a subset of my dataset which is already a selection of a single barcode within the initial fast5 files. Hence, I would expect a significant higher number (>80%) of reads designated to my barcode.
Thank you in advance. Regards, Nick
Dear @iiSeymour
Any update on this yet?
Thanks a lot!
Hey @menickname
See https://github.com/nanoporetech/bonito/issues/175 - can you try with --ctc-min-coverage 0.99
, also filtering out any lower quality reads should help.
Dear @iiSeymour
Unfortunately this does not result in better demultiplexing. Only high quality and longest reads were used for model training. One of my datasets results in only 33.84% of the reads being classified (original Guppy basecalled dataset 70-80%) with or without the --ctc-min-coverage 0.99 option. The issues seems not to be solved in this way.
I would also be surprised that after Bonito basecalling I have a reduction of higher quality reads? I have verified my model on separate (single isolate and no multiplexed) files on the generation of higher accurate genomes of my species of interest and this gave a significant increase, hence I am rather surprised this is happening during the demultiplexing.
Any other thoughts? Thank you in advance.
I have seen a similar issue when using a bonito-trained model with guppy. I lose a HUGE amount of reads to the dreaded "unclassified" bin.
Have you managed to find any way of recovering these lost reads @menickname?
Hi @mbhall88 and @menickname, I am experiencing the exact same problem, do you have any update on this issue? any progress/experience will be greatly appreciated. Thank you very much!
Hi @CWYuan08, sadly no. I tried a lot of different things - e.g., chopping raw signal of the start and end before training etc. But to no avail.
I basically had to abandon the project as I couldn't justify losing so many reads to demultiplexing
Thank you @mbhall88 for sharing your update, sorry to hear you had to stop there.
hi @CWYuan08 and @mbhall88, I have indeed not found a solution on the Bonito demultiplexing itself. To still make use of the Bonito tool, I use demultiplexed files from MinKNOW as input. Since we are using a GridION sequencing device, we perform real-time super-accurate base calling and demultiplexing with Guppy (within MinKNOW). This generates both fastq and fast5 files per barcode. Then I simply use the demultiplexed fast5 files as input for the Bonito software. Not the most efficient solution, but it is how I can still use Bonito for base calling with custom models.
Would using an existing model and improving it (--pretrained
) for our species of interest be a better strategy?