modkit icon indicating copy to clipboard operation
modkit copied to clipboard

How to get gzipped tab-separated-values (tabix file) from bedmethyl file generated by modkit, for down stream analysis by nanomethviz R package?

Open ralanany opened this issue 1 year ago • 8 comments

I have 10 bedmethyl files for 10 patients(1 bedmethylfile/patient). I want to generate the tabix file(input for

nanomethviz R package) for down stream processing. I want the following format, would you advise how to move from ## bedmethyl format to the tabix format?

sample chr pos strand statistic

1 B6Cast_Prom_1_bl6 chr11 101463573 * -0.33 2 B6Cast_Prom_1_bl6 chr11 101463573 * -1.87 3 B6Cast_Prom_1_bl6 chr11 101463573 * -4.19 4 B6Cast_Prom_1_bl6 chr11 101463573 * 0.10 5 B6Cast_Prom_1_cast chr11 101463573 * -0.38 6 B6Cast_Prom_1_cast chr11 101463573 * -0.84

read_name

1 6cc38b35-6570-4b44-a1e3-2605fcf2ffe8 2 787f5f43-d144-4e15-ab7d-6b1474083389 3 c7ee7fb4-a915-4da7-9f36-da6ed5e68af2 4 bff8b135-0296-4495-9354-098242ea8cc4 5 11fe130b-8d48-4399-a9fa-2ca2860fa355 6 502fef95-c2f2-46ad-9bc5-fb3fc80b4245

ralanany avatar Dec 10 '24 08:12 ralanany

Hello @ralanany,

I think you can probably format the bedMethyl output this way. Do you know what the statistic column is? From a quick read of the nanomethviz docs, I couldn't tell. @Shians maybe you know?

ArtRand avatar Dec 11 '24 16:12 ArtRand

Thank you for your reply, but in the package html, They mentioned,

We currently support output from Nanopolish f5c Megalodon

the format output from nanopolish (mentioned below), Then this output format can be converted to tabix indexed bgzipped format by create_tabix_file function in the methviz package, but unfortunately the modkit bedmethyl file is different, my question is how to convert the bedmethyl from modkit to tabix or to be like nanopolish output

chromosome strand start end read_name

1 chr1 - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb 2 chr11 - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478 3 chr11 + 69150806 69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044 4 chr1 + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9 5 chrY - 4082060 4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6 6 chr8 + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312

log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands

1 -5.91 -100.38 -94.47 1 2 -8.07 -115.21 -107.13 1 3 -1.65 -183.12 -181.47 1 4 2.74 -112.14 -114.88 1 5 -1.78 -135.09 -133.32 1 6 5.02 -129.31 -134.33 1

num_motifs sequence

1 1 CATTACGTTTC 2 1 AACTTCGTTGA 3 2 GGTCACGGGAATCCGGTTC 4 1 AGAAGCGCTAA 5 1 CTCACCGTATA 6 1 TCTGACGTTGA

ralanany avatar Dec 12 '24 05:12 ralanany

I recently tried to implement direct import of modkit bedmethyl. Looking at your data, I don't think it quite lines up with my expected columns. Could you let me know what command in modkit you used and what version?

https://github.com/Shians/NanoMethViz/commit/4f8181077feb2ca2b6362162643d802235fb8741

Repeated issue: https://github.com/Shians/NanoMethViz/issues/49

Shians avatar Dec 12 '24 05:12 Shians

I recently tried to implement direct import of modkit bedmethyl. Looking at your data, I don't think it quite lines up with my expected columns. Could you let me know what command in modkit you used and what version?

Shians/NanoMethViz@4f81810

Repeated issue: Shians/NanoMethViz#49

Thanks for your reply

here is the format for the betmethyl file generated from this command

modkit pileup path/to/reads.bam output/path/pileup.bed --cpg --ref path/to/reference.fasta

chr1 10468 10469 h 7 . 10468 10469 255,0,0 7 0.00 0 7 0 0 2 0 0 chr1 10468 10469 m 7 . 10468 10469 255,0,0 7 0.00 0 7 0 0 2 0 0

The column names for this file is mentioned here https://github.com/nanoporetech/modkit

ralanany avatar Dec 12 '24 05:12 ralanany

NanoMethViz is intended to be used with read-level information, as such pile-up information isn't compatible since it aggregates read-level information to site-level information. If you instead run modkit extract full then I believe you should be able to directly import the data into Tabix format.

Shians avatar Dec 12 '24 06:12 Shians

Thank you Shians for your reply again, I used the recommended command, I used The input bam file, and the output is tsv, but still it is different from nanopolish output Here is the file content

read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand

fw_soft_clipped_start fw_soft_clipped_end read_length mod_qual mod_code base_qual ref_kmer query_kmer canonical_base modified_primary_base inferred flag

7ee32bc3-3bc2-4e05-8293-b478eae576c7 167 348147 chr1 + + + 38 13 912 0.15820313 h 11 . AGCGT C C false 0

ralanany avatar Dec 12 '24 08:12 ralanany

@ralanany this output should be directly usable as input to create_tabix_file(), it does not need to match nanopolish, the latest version of NanoMethViz should be able to accept modkit extract full format as input. Please let me know if doesn't work in the latest GitHub or Bioc Devel version of NanoMethViz.

Shians avatar Jan 06 '25 01:01 Shians

@Shians Hello, I'm new to data analysis. We've done some nanopore sequencing and I'd like to know if it's possible to use NanoMethViz with modkit.bed files and if so, how? I've tried but haven't succeeded...

Thanks for your answer

AdeMar-16 avatar Apr 14 '25 09:04 AdeMar-16