remora Data preparation scripts for Remora models with random bases

Data preparation scripts for Remora models with random bases

Open AnWiercze opened this issue 3 years ago • 2 comments

Hello Remora Team,

In this year's ONT update, Clive mentioned that the newer models that perform better than BS-seq are trained with sequences that contain a modified position with +-30 random bases around that position, if I understand it correctly. Are the scripts to prepare the training data for this kind of input data publicly available? Right now only fully modified and unmodified reads are applicable with the data preparation scripts uploaded here, correct?

Thanks for your help!

Cheers, Anna

Oct 14 '22 10:10 AnWiercze

These scripts are not currently publicly available. We are working to improve the robustness of this workflow and release this code at some point in the future.

We will be updating the data preparation scripts very soon to take pod5 and bam input to directly create a Remora dataset. This will add a lot more flexibility to dataset generation outside of the "fully modified at a motif" type datasets.

Oct 14 '22 16:10 marcus1487

Thanks a lot for sharing these information! I am looking forward to the next release. :)

Oct 14 '22 16:10 AnWiercze

Betta is now available via a developer release. Please see instructions for accessing the repository in the community note here (login required): https://community.nanoporetech.com/posts/betta-tool-release

May 18 '23 23:05 marcus1487

remora remora copied to clipboard

Data preparation scripts for Remora models with random bases

remora
remora copied to clipboard