bonito icon indicating copy to clipboard operation
bonito copied to clipboard

bonito mod-calling with remora model

Open hd2326 opened this issue 2 years ago • 9 comments

Greetings!

I have a trained onnx remora model, and I am wondering whether would it be possible to convert it to the tar+toml format for bonito mod-calling. Thank you very much in advance for your help!

hd2326 avatar Jul 05 '22 16:07 hd2326

Hello @hd2326 you can give bonito your custom trained remora onnx model with bonito basecaller --modified-base-model custom.onnx (it doesn't need converting).

iiSeymour avatar Jul 05 '22 16:07 iiSeymour

@iiSeymour Thank you very much for the quick reply!

So the mod-calling will be a two-step process, 1) using bonito tar+toml for basealling, based on which 2) using remora onnx to make mod-calling, right?

hd2326 avatar Jul 05 '22 17:07 hd2326

You need both models (a bonito basecalling model [tar+toml] and a remora modbase [onnx] model) but it's one command -

bonito basecaller [email protected] /data/reads --modified-base-model custom.onnx > calls.bam

iiSeymour avatar Jul 05 '22 17:07 iiSeymour

Got it! Thank you so much for the explanation!

hd2326 avatar Jul 05 '22 17:07 hd2326

Greetings!

As I am running bonito as @iiSeymour suggested:

bonito basecaller $bonito_models/[email protected]/ $rawdata --modified-base-model $remora_model/model_best.onnx --modified-bases $mod --reference $genome

I got the following error:

remora.RemoraError: No trained Remora models for /bonito_models/dna_r9.4.1_e8. Options: dna_r9.4.1_e8, dna_r9.4.1_e8.1, dna_r10.4_e8.1

It seems that the remora model I provided cannot be recognized. Any insights on the issue? Thank you very much!

hd2326 avatar Jul 12 '22 16:07 hd2326

The --modified-bases argument triggers bonito to lookup the corresponding remora model. But it appears that you have also specified the path to a remora model with the --modified-base-model argument which specified the modified bases to call. Removing the --modified-bases argument from the call should work.

marcus1487 avatar Jul 12 '22 16:07 marcus1487

Awesome! As @marcus1487 suggested, removing --modified-bases solves the problem!

But another problem came...

It seems that the bonito bam files are not compatible with samtools mpileup for modification analysis. I got the samtools mpileup: error reading from input file error, but I don't have this problem for guppy bam files.

Specifically, the modification I am trying to analysis is uracil (I named it x and 5xT) in DNA, and I trained the bonito model using the following workflow:

  1. I ran taiyaki prepare_mapped_reads.py with --mod x T 5xT to generate the hdf5 file.
  2. I ran remora dataset prepare with --motif T 0 to convert the hdf5 file to the npz file.
  3. I ran remora model train with the provided ConvLSTM_w_ref.py model template.

The MM tag I got in bonito bam files is like MM:Z:['T']+x,-1,...,-1;. As for guppy bam files I got something like Mm:Z:C+m,0,...,0;. Not sure what does the negative MM value mean, and maybe that causes the incompatible problem? Any insights on the issue? Thank you very much!

hd2326 avatar Jul 13 '22 16:07 hd2326

Hi @hd2326,

I was wondering if you ended up getting bonito to basecall Us?

best, S

najohink avatar Aug 04 '22 18:08 najohink

@najohink Actually no. Still the same error...

hd2326 avatar Aug 04 '22 22:08 hd2326