Haplotagged CRAM and bedmethyl
Hello,
I do have a haplotagged CRAM file, and phased outputs from the epi2melabs/wf-human-variation workflow, we have used --mod with --phase parameter. I have bedmethyl files for haplotypes 1, 2 and ungrouped ones. Would it make sense to sum the bedmethyl files to generate bedmethyl outputs with bedtools unionbedg instead of running modkit pileup --combine-strands with cram file again? When I run modkit pileup with cram file it says "non-BAM.. CRAM may be unstable"
bedtools unionbedg -i hap1.bed hap2.bed ungrouped.bed > combined.bed
Thanks very much!
Hello @umranyaman,
It sounds like you want a bedMethyl with all of the counts aggregated, as through you had run pileup without partitioning on the HP tag. I would use modkit bedmethyl merge to combine the bedmethyl files you have already. You could use the untagged ones or omit them. Regarding CRAM, it is supported - it's just a bit slower and not tested quite as heavily so Modkit warns you. I could probably get rid of this warning to be honest.
Thank you! It indeed makes sense to aggregate the hap1 and 2. I am still not sure about discarding ungrouped calls. Is there any reason modkit pileup including ungrouped ones, e.g. why it could be useful.
Sorry for the confusion.
Thank you!
Hello @umranyaman,
Regarding using the ungrouped reads, it depends on what you're trying to do. The ungrouped reads are ones that don't have the haplotag, so I imagine this means the phasing algorithm couldn't assign them confidently or they don't overlap any HET variants. If you give me a few more details on what you're going - maybe I can be of more help.
Thanks @ArtRand!
I am considering downstream analyses. I will perform differential methylation analysis across multiple samples and identify genes associated with the trait using aggregated methylation. From there, I would investigate whether the gene has allele-specific methylation as well, here I am not sure how to perform this across samples yet, but the end goal is to have aggregated vs haplotype-specific methylation levels. If I compare hap1 vs hap2 across samples, it discards ungrouped already, so I am wondering whether I should discard them on the aggregated version as well.
Thanks very much
Thanks @ArtRand!
I am considering downstream analyses. I will perform differential methylation analysis across multiple samples and identify genes associated with the trait using aggregated methylation. From there, I would investigate whether the gene has allele-specific methylation as well, here I am not sure how to perform this across samples yet, but the end goal is to have aggregated vs haplotype-specific methylation levels. If I compare hap1 vs hap2 across samples, it discards ungrouped already, so I am wondering whether I should discard them on the aggregated version as well.
Thanks very much
Have you figured out how to perform allele-specific methylation across samples? If yes could you share some insights in that? I'm trying something similar.
Hello @umranyaman and @vaishnavpvarma
As of Modkit v0.6.0 there is much more robust CRAM support and a --phased option for exactly the use case you have here. Now instead of creating an "ungrouped" bedMethyl there is a "combined" bedmethyl which contains methylation counts for the haplotyped reads as well as the untagged ones. I think this is perfect for that you're doing here. Please see the migration guide for any more details.
For allele-specific differential methylation across samples, here's an idea:
- Could you look for DMRs/DMCs using the "combined" file (all reads) with two samples => sample-level differences
- Find DMRs/DMCs between haplotypes within a sample => haplotype-specific DMRs/DMCs
- Intersect these to find haplotype-specific DMRs that are also DMRs between samples