remora icon indicating copy to clipboard operation
remora copied to clipboard

Question about data preparation for training model

Open pterzian opened this issue 1 year ago • 1 comments

Hi,

I understand that to train a model using remora you first have to basecall fully unmethylated (pcr) or fully methylated reads (sssI) then merge both result to build a training dataset using taiyaki/misc/merge_mappedsignalfiles.py. However in my case I need to use only specific genomic positions I know to be always methylated/unmethylated from BS-seq reference. Is this something I can do with remora before the merging of basecalls ? Or using taiyaki ?

Thanks,

Paul

pterzian avatar Oct 03 '22 11:10 pterzian

This is functionality that is quite difficult with the current megalodon/taiyaki framework. We are working on a major re-write of the data prep and will have this out in the next release. I will post here once this update has been applied and it should make such a use case much more feasible.

marcus1487 avatar Oct 06 '22 21:10 marcus1487