remora icon indicating copy to clipboard operation
remora copied to clipboard

Add RNA Models to Pretrained Folder

Open VBHerrenC opened this issue 1 year ago • 4 comments

Hello,

We are trying to use Remora based off of a pretrained model, but are working with RNA. Would it be possible to add the RNA models that are currently available in Bonito to the pre-trained models folder in Remora? I'm not sure how to link the Bonito models to Remora like the documentation mentions, since the Bonito models are a folder and doesn't seem to have the PyTorch file that's present for all of the other pretrained Remora models. Also happy to find another workaround if it's not possible to add new pretrained models to Remora, just trying to avoid training a model from scratch. Thanks!

VBHerrenC avatar Jan 25 '24 18:01 VBHerrenC

I'm not sure I understand this request. Are you looking for the basecalling models to be made available through remora? Or are you looking for specific modified base models? If it is the latter, please use the remora model list_pretrained command to list the models and remora model download to download a model.

marcus1487 avatar Jan 26 '24 00:01 marcus1487

Hi Marcus,

Thanks for the reply! I was hoping for the basecalling models to be made available through remora. The output of the remora model list_pretrained command doesn't include any RNA models. I tried to add the RNA004 Bonito model to the Remora paths but haven't had any luck yet. Happy to do it that way as well but not sure how to go about that since the remora pretrained models all seem to have a pytorch file that I don't think the Bonito folder includes. If we are trying to train an RNA model for N1-methyl-pseudouridine, is our best option just to train from scratch if neither of the above options work? Thanks for any advice.

VBHerrenC avatar Jan 26 '24 13:01 VBHerrenC

Remora has a compiled format .PT for training. Given that we are trying to train a modified basecaller for Pseudouridine, we would like to use the Remora training framework as it is well documented. However, the RNA004 models have only been uploaded in a format readable by the Bonito framework. While I do understand that Bonito has a training framework, the documentation on how to use it is significantly less clear than Remora's. To That end, we were wondering if the RNA 004 model could be uploading if a format compatible with the Remora training framework.

VBHarrisN avatar Jan 26 '24 14:01 VBHarrisN

I'm a bit confused by the goal here. Remora is the framework to train modified base detection models. Bonito is the framework for training canonical basecallers. Remora does not have the capability to train a canonical basecaller. If you are intending to train a canonical basecalling model that can call modified bases as the correct canonical base then Bonito is the correct tool. If you then want to identify the position of the modified bases within the sequence a Remora model is required.

Note that Remora does come with a modified base model for m6A in DRACH (remora model list_pretrained --pore rna004_130bps), but this is not going to be helpful if your goal is to train a basecalling model which will work with N1-pseudo-uridine.

I hope this helps clear up the function of the training frameworks. If you have any further questions about training a modified base detection model please post them here.

marcus1487 avatar Mar 07 '24 14:03 marcus1487

Closing as this issues seems to be unrelated to modified base calling.

marcus1487 avatar Jun 03 '24 22:06 marcus1487