modkit icon indicating copy to clipboard operation
modkit copied to clipboard

calculate the 5mC signal for the CG sites in a specific region

Open hannan666666 opened this issue 9 months ago • 1 comments

The pileup.bed.gz(bedMethyl files) contains all types of modifications (CG, CH), about 39G. but I only want to calculate the 5mC signal for the CG sites in a specific region. I now need to re-pileup the data with only the CG motif or use grep to extract the rows containing the CG motif to create a new .pileup.bed.gz and .pileup.bed.gz.tbi, which is too slow. Are there any parameters that can achieve this when using the modkit stats command?

modkit stats ${modkit_d}/${pod5_f}.6mA_5mC5hmC.pileup.bed.gz
--regions $genome_regions --threads 20
-o ${modkit_d}/${pod5_f}.6mA_5mC5hmC.genome_regions.stats.tsv > ${modkit_d}/${pod5_f}.6mA_5mC5hmC.genome_regions.stats.tsv.log
_

hannan666666 avatar Mar 31 '25 09:03 hannan666666

Hello @hannan666666,

There isn't an option to subset a bedMethyl by motif in modkit stats, but that's a good suggestion! Right now, the best way I can recommend is to use modkit motif bed $fasta CG 0 > CG0.bed to get the locations of the CpGs, then bedtools intersect to subset the bedMethyl.

ArtRand avatar Mar 31 '25 20:03 ArtRand