Question about afterwork using modkit result & indexing bedMethyl files for igv
Hello,
I have two questions.
Question 1.
I could collect methylation information using modkit pileup.
The result shows all the possible methylation sites, and most of them were not methylated as shown below.
ptg000001l 86595 86596 h 16 - 86595 86596 255,0,0 16 0.00 0 16 0 0 1 0 0 ptg000001l 86595 86596 m 16 - 86595 86596 255,0,0 16 0.00 0 16 0 0 1 0 0 ptg000001l 86596 86597 h 15 + 86596 86597 255,0,0 15 0.00 0 13 2 0 0 0 0 ptg000001l 86596 86597 m 15 + 86596 86597 255,0,0 15 13.33 2 13 0 0 0 0 0 ptg000001l 86597 86598 h 16 - 86597 86598 255,0,0 16 0.00 0 16 0 0 0 0 1 ptg000001l 86597 86598 m 16 - 86597 86598 255,0,0 16 0.00 0 16 0 0 0 0 1
Could you please introduce some tools that can be used to work with this result? For example, collecting highly methylated sites, etc. I think I can collect sited with highly methylated sited (e.g. 70%), etc, but I think people already made good tools (kind of standard tools) for the downsteam analysis. It will be much appreciated if you can introduce some good tools.
(If I have two samples, I could use also modkit dmr, but I would like to know the downstream method when I have only one sample.)
Question 2.
To view methylated sites in igv. I need index file of bedMethyl files.
If I compress the medMethyl file first, and then index, at least the tabix process works. But I cannot load compressed medMethyl file in igv.
bgzip modkit_pileup_result.bed
tabix modkit_pileup_result.bed.gz
However, if I try to make index file from uncompressed bedMethyl file, it does not work.
tabix -p bed modkit_pileup_result.bed
Inside the IGV, igvtools successfulled made index file (the idx file) and I could check methylated site in igv, which means my input bedMethyl file is ok.
I also tried gatk for indexing the bedMethyl file, but it did not work.
gatk IndexFeatureFile --input modkit_pileup_result.bed
Could you please tell me the best method to make the index of uncompressed bedMethyl file for igv?
Thank you. HJK
Hello @hyunjokoo,
Sorry for the slow reply.
I'm going to answer your second question (visualization in IGV) first. I believe the latest version of the IGV desktop application will display the bedMethyl output. You shouldn't need to perform any compression or indexing. The output files should be sorted. Let me know if this isn't the case. If the file is too large you can subset it with awk, bedtools, or ingesting it into a platform like python or R and subsetting the rows you're interested in (see next section).
To your question regarding how to explore your data. This is largely personal and depends on the goals of your project. I prefer to use polars/pandas to ingest the bedMethyl. There are some code snippets on epi2me for doing this. From there, I would do typical exploratory data analysis. I prefer the "explore -> IGV -> explore" loop. Where I look for interesting regions in the bedBethyl, plot the regions (with the reads) in IGV, then restart the loop. Your mind is actually pretty good at finding patterns in the alignments and figuring out the next question to ask. Producing summary statistics from the bedMethyl is also often a good idea and can indicate ways that the data should be cleaned. I also use bedtools a lot when working with bedMethyls (and BED files in general) for SQL-like things such as JOINs. I've also heard good things about methylartist although I haven't used it myself.
Happy to answer any additional questions, but other than general advice it will be hard for me without additional details.
@hyunjokoo let me know if you have any additional questions.