bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Lowered barcode recognition of bonito basecalled data with Bonito 0.4.0

Open menickname opened this issue 2 years ago • 9 comments

Dear @iiSeymour

I do experience a similar issue as in issue 26 earlier. After training a new model using the Bonito 0.4.0 software the demultiplexing (qcat) command results in only 40-70% of the reads being addressed to the correct barcode. I am using a subset of my dataset which is already a selection of a single barcode within the initial fast5 files. Hence, I would expect a significant higher number (>80%) of reads designated to my barcode.

Thank you in advance. Regards, Nick

menickname avatar Aug 30 '21 13:08 menickname

Dear @iiSeymour

Any update on this yet?

Thanks a lot!

menickname avatar Sep 13 '21 08:09 menickname

Hey @menickname

See https://github.com/nanoporetech/bonito/issues/175 - can you try with --ctc-min-coverage 0.99, also filtering out any lower quality reads should help.

iiSeymour avatar Sep 14 '21 10:09 iiSeymour

Dear @iiSeymour

Unfortunately this does not result in better demultiplexing. Only high quality and longest reads were used for model training. One of my datasets results in only 33.84% of the reads being classified (original Guppy basecalled dataset 70-80%) with or without the --ctc-min-coverage 0.99 option. The issues seems not to be solved in this way.

I would also be surprised that after Bonito basecalling I have a reduction of higher quality reads? I have verified my model on separate (single isolate and no multiplexed) files on the generation of higher accurate genomes of my species of interest and this gave a significant increase, hence I am rather surprised this is happening during the demultiplexing.

Any other thoughts? Thank you in advance.

menickname avatar Sep 16 '21 08:09 menickname

I have seen a similar issue when using a bonito-trained model with guppy. I lose a HUGE amount of reads to the dreaded "unclassified" bin.

Have you managed to find any way of recovering these lost reads @menickname?

mbhall88 avatar May 03 '22 05:05 mbhall88

Hi @mbhall88 and @menickname, I am experiencing the exact same problem, do you have any update on this issue? any progress/experience will be greatly appreciated. Thank you very much!

CWYuan08 avatar Dec 12 '22 11:12 CWYuan08

Hi @CWYuan08, sadly no. I tried a lot of different things - e.g., chopping raw signal of the start and end before training etc. But to no avail.

I basically had to abandon the project as I couldn't justify losing so many reads to demultiplexing

mbhall88 avatar Dec 12 '22 22:12 mbhall88

Thank you @mbhall88 for sharing your update, sorry to hear you had to stop there.

CWYuan08 avatar Dec 13 '22 09:12 CWYuan08

hi @CWYuan08 and @mbhall88, I have indeed not found a solution on the Bonito demultiplexing itself. To still make use of the Bonito tool, I use demultiplexed files from MinKNOW as input. Since we are using a GridION sequencing device, we perform real-time super-accurate base calling and demultiplexing with Guppy (within MinKNOW). This generates both fastq and fast5 files per barcode. Then I simply use the demultiplexed fast5 files as input for the Bonito software. Not the most efficient solution, but it is how I can still use Bonito for base calling with custom models.

menickname avatar Dec 13 '22 09:12 menickname

Would using an existing model and improving it (--pretrained) for our species of interest be a better strategy?

fergsc avatar Jun 02 '23 01:06 fergsc