modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Haplotagged CRAM and bedmethyl

Open umranyaman opened this issue 11 months ago • 4 comments

Hello,

I do have a haplotagged CRAM file, and phased outputs from the epi2melabs/wf-human-variation workflow, we have used --mod with --phase parameter. I have bedmethyl files for haplotypes 1, 2 and ungrouped ones. Would it make sense to sum the bedmethyl files to generate bedmethyl outputs with bedtools unionbedg instead of running modkit pileup --combine-strands with cram file again? When I run modkit pileup with cram file it says "non-BAM.. CRAM may be unstable"

bedtools unionbedg -i hap1.bed hap2.bed ungrouped.bed > combined.bed

Thanks very much!

umranyaman avatar Jan 27 '25 23:01 umranyaman

Hello @umranyaman,

It sounds like you want a bedMethyl with all of the counts aggregated, as through you had run pileup without partitioning on the HP tag. I would use modkit bedmethyl merge to combine the bedmethyl files you have already. You could use the untagged ones or omit them. Regarding CRAM, it is supported - it's just a bit slower and not tested quite as heavily so Modkit warns you. I could probably get rid of this warning to be honest.

ArtRand avatar Jan 28 '25 02:01 ArtRand

Thank you! It indeed makes sense to aggregate the hap1 and 2. I am still not sure about discarding ungrouped calls. Is there any reason modkit pileup including ungrouped ones, e.g. why it could be useful.

Sorry for the confusion.

Thank you!

umranyaman avatar Jan 28 '25 15:01 umranyaman

Hello @umranyaman,

Regarding using the ungrouped reads, it depends on what you're trying to do. The ungrouped reads are ones that don't have the haplotag, so I imagine this means the phasing algorithm couldn't assign them confidently or they don't overlap any HET variants. If you give me a few more details on what you're going - maybe I can be of more help.

ArtRand avatar Jan 29 '25 15:01 ArtRand

Thanks @ArtRand!

I am considering downstream analyses. I will perform differential methylation analysis across multiple samples and identify genes associated with the trait using aggregated methylation. From there, I would investigate whether the gene has allele-specific methylation as well, here I am not sure how to perform this across samples yet, but the end goal is to have aggregated vs haplotype-specific methylation levels. If I compare hap1 vs hap2 across samples, it discards ungrouped already, so I am wondering whether I should discard them on the aggregated version as well.

Thanks very much

umranyaman avatar Jan 29 '25 16:01 umranyaman

Thanks @ArtRand!

I am considering downstream analyses. I will perform differential methylation analysis across multiple samples and identify genes associated with the trait using aggregated methylation. From there, I would investigate whether the gene has allele-specific methylation as well, here I am not sure how to perform this across samples yet, but the end goal is to have aggregated vs haplotype-specific methylation levels. If I compare hap1 vs hap2 across samples, it discards ungrouped already, so I am wondering whether I should discard them on the aggregated version as well.

Thanks very much

Have you figured out how to perform allele-specific methylation across samples? If yes could you share some insights in that? I'm trying something similar.

vaishnavpvarma avatar Nov 26 '25 06:11 vaishnavpvarma

Hello @umranyaman and @vaishnavpvarma

As of Modkit v0.6.0 there is much more robust CRAM support and a --phased option for exactly the use case you have here. Now instead of creating an "ungrouped" bedMethyl there is a "combined" bedmethyl which contains methylation counts for the haplotyped reads as well as the untagged ones. I think this is perfect for that you're doing here. Please see the migration guide for any more details.

For allele-specific differential methylation across samples, here's an idea:

  • Could you look for DMRs/DMCs using the "combined" file (all reads) with two samples => sample-level differences
  • Find DMRs/DMCs between haplotypes within a sample => haplotype-specific DMRs/DMCs
  • Intersect these to find haplotype-specific DMRs that are also DMRs between samples

ArtRand avatar Dec 01 '25 19:12 ArtRand