remora Question about data preparation for training model

Question about data preparation for training model

Open pterzian opened this issue 3 years ago • 1 comments

Hi,

I understand that to train a model using remora you first have to basecall fully unmethylated (pcr) or fully methylated reads (sssI) then merge both result to build a training dataset using taiyaki/misc/merge_mappedsignalfiles.py. However in my case I need to use only specific genomic positions I know to be always methylated/unmethylated from BS-seq reference. Is this something I can do with remora before the merging of basecalls ? Or using taiyaki ?

Thanks,

Paul

Oct 03 '22 11:10 pterzian

This is functionality that is quite difficult with the current megalodon/taiyaki framework. We are working on a major re-write of the data prep and will have this out in the next release. I will post here once this update has been applied and it should make such a use case much more feasible.

Oct 06 '22 21:10 marcus1487

This feature is now implemented in remora 2.0. You can run remora dataset prepare reads.pod5 mappings.bam --focus-reference-positions mod_pos.bed --mod-base m 5mC in order to prepare a dataset with chunks from defined reference positions.

Dec 08 '22 16:12 marcus1487

remora remora copied to clipboard

Question about data preparation for training model

remora
remora copied to clipboard