remora icon indicating copy to clipboard operation
remora copied to clipboard

Correct model to use for 5hmC calling

Open rlowe-altoslabs opened this issue 1 year ago • 1 comments

Not sure if this should be a bonito issue or a remora issue.

I run the following:

bonito basecaller [email protected] $input_path --modified-bases 5mC 5hmC --reference $reference > basecalls_with_mods.sam

I then I get the following error:

--- Logging error ---
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/remora/model_util.py", line 549, in load_model
      submodels = submodels[modified_bases]
  KeyError: '5hmc_5mc'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/usr/lib/python3.8/logging/__init__.py", line 1085, in emit
      msg = self.format(record)
    File "/usr/lib/python3.8/logging/__init__.py", line 929, in format
      return fmt.format(record)
    File "/usr/local/lib/python3.8/dist-packages/bonito/mod_util.py", line 25, in format
      self._style._fmt = self.fmt
  AttributeError: 'CustomFormatter' object has no attribute 'fmt'
  Call stack:
    File "/usr/local/bin/bonito", line 8, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.8/dist-packages/bonito/__init__.py", line 34, in main
      args.func(args)
    File "/usr/local/lib/python3.8/dist-packages/bonito/cli/basecaller.py", line 75, in main
      mods_model = load_mods_model(
    File "/usr/local/lib/python3.8/dist-packages/bonito/mod_util.py", line 47, in load_mods_model
      return load_model(
    File "/usr/local/lib/python3.8/dist-packages/remora/model_util.py", line 551, in load_model
      LOGGER.error(
  Message: 'Remora model for modified bases 5hmc_5mc not found for [email protected].'
  Arguments: ()

I checked the pre-trained models which suggest that at least for "v0.0.0" there should be a 5hmC model:

[15:29:15] Remora pretrained modified base models:
Pore              Basecall_Model_Type    Basecall_Model_Version    Modified_Bases    Remora_Model_Type      Remora_Model_Version
----------------  ---------------------  ------------------------  ----------------  -------------------  ----------------------
dna_r10.4_e8.1    fast                   0.0.0                     5mc               CG                                        0
dna_r10.4_e8.1    fast                   0.0.0                     5mc               CG                                        1
dna_r10.4_e8.1    fast                   0.0.0                     5hmc_5mc          CG                                        0
dna_r10.4_e8.1    fast                   v3.3                      5mc               CG                                        0
dna_r10.4_e8.1    fast                   v3.3                      5mc               CG                                        1
dna_r10.4_e8.1    hac                    0.0.0                     5mc               CG                                        0
dna_r10.4_e8.1    hac                    0.0.0                     5mc               CG                                        1
dna_r10.4_e8.1    hac                    0.0.0                     5hmc_5mc          CG                                        0
dna_r10.4_e8.1    hac                    v3.3                      5mc               CG                                        0
dna_r10.4_e8.1    hac                    v3.3                      5mc               CG                                        1
dna_r10.4_e8.1    sup                    0.0.0                     5mc               CG                                        0
dna_r10.4_e8.1    sup                    0.0.0                     5mc               CG                                        1
dna_r10.4_e8.1    sup                    0.0.0                     5hmc_5mc          CG                                        0
dna_r10.4_e8.1    sup                    v3.4                      5mc               CG                                        0
dna_r10.4_e8.1    sup                    v3.4                      5mc               CG                                        1
dna_r10.4.1_e8.2  fast                   0.0.0                     5mc               CG                                        0
dna_r10.4.1_e8.2  fast                   0.0.0                     5mc               CG                                        1
dna_r10.4.1_e8.2  fast                   v3.5.1                    5mc               CG                                        0
dna_r10.4.1_e8.2  fast                   v3.5.1                    5mc               CG                                        1
dna_r10.4.1_e8.2  hac                    0.0.0                     5mc               CG                                        0
dna_r10.4.1_e8.2  hac                    0.0.0                     5mc               CG                                        1
dna_r10.4.1_e8.2  hac                    v3.5.1                    5mc               CG                                        0
dna_r10.4.1_e8.2  hac                    v3.5.1                    5mc               CG                                        1
dna_r10.4.1_e8.2  sup                    0.0.0                     5mc               CG                                        0
dna_r10.4.1_e8.2  sup                    0.0.0                     5mc               CG                                        1
dna_r10.4.1_e8.2  sup                    v3.5.1                    5mc               CG                                        0
dna_r10.4.1_e8.2  sup                    v3.5.1                    5mc               CG                                        1

I changed my call to:

bonito basecaller [email protected] $input_path --modified-bases 5mC 5hmC --reference $reference > basecalls_with_mods.sam

But this then complains about a non existent model. I am sure I am being stupid?

  > available models:
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]
   - [email protected]

rlowe-altoslabs avatar Aug 01 '22 16:08 rlowe-altoslabs

The 0.0.0 model is meant to be the "default model" when selecting from the Remora repo, when the model version is not specified. But given that the specific Bonito version 3.4 is specified but does not have a trained model included Remora fails. This behavior should be adjusted to raise a warning and load the default model. I'll address this issue in the next Remora release. For now you can download the Remora repo and point Bonito to the 0.0.0 model (here in the Remora repo models/trained_models/dna_r10.4_e8.1/sup/0.0.0/5hmc_5mc/CG/v0/modbase_model.onnx) via the --modified-base-model Bonito argument.

marcus1487 avatar Aug 03 '22 11:08 marcus1487