remora icon indicating copy to clipboard operation
remora copied to clipboard

Data preparation scripts for Remora models with random bases

Open AnWiercze opened this issue 1 year ago • 2 comments

Hello Remora Team,

In this year's ONT update, Clive mentioned that the newer models that perform better than BS-seq are trained with sequences that contain a modified position with +-30 random bases around that position, if I understand it correctly. Are the scripts to prepare the training data for this kind of input data publicly available? Right now only fully modified and unmodified reads are applicable with the data preparation scripts uploaded here, correct?

Thanks for your help!

Cheers, Anna

AnWiercze avatar Oct 14 '22 10:10 AnWiercze

These scripts are not currently publicly available. We are working to improve the robustness of this workflow and release this code at some point in the future.

We will be updating the data preparation scripts very soon to take pod5 and bam input to directly create a Remora dataset. This will add a lot more flexibility to dataset generation outside of the "fully modified at a motif" type datasets.

marcus1487 avatar Oct 14 '22 16:10 marcus1487

Thanks a lot for sharing these information! I am looking forward to the next release. :)

AnWiercze avatar Oct 14 '22 16:10 AnWiercze