How to get gzipped tab-separated-values (tabix file) from bedmethyl file generated by modkit, for down stream analysis by nanomethviz R package?
I have 10 bedmethyl files for 10 patients(1 bedmethylfile/patient). I want to generate the tabix file(input for
nanomethviz R package) for down stream processing. I want the following format, would you advise how to move from ## bedmethyl format to the tabix format?
sample chr pos strand statistic
1 B6Cast_Prom_1_bl6 chr11 101463573 * -0.33 2 B6Cast_Prom_1_bl6 chr11 101463573 * -1.87 3 B6Cast_Prom_1_bl6 chr11 101463573 * -4.19 4 B6Cast_Prom_1_bl6 chr11 101463573 * 0.10 5 B6Cast_Prom_1_cast chr11 101463573 * -0.38 6 B6Cast_Prom_1_cast chr11 101463573 * -0.84
read_name
1 6cc38b35-6570-4b44-a1e3-2605fcf2ffe8 2 787f5f43-d144-4e15-ab7d-6b1474083389 3 c7ee7fb4-a915-4da7-9f36-da6ed5e68af2 4 bff8b135-0296-4495-9354-098242ea8cc4 5 11fe130b-8d48-4399-a9fa-2ca2860fa355 6 502fef95-c2f2-46ad-9bc5-fb3fc80b4245
Hello @ralanany,
I think you can probably format the bedMethyl output this way. Do you know what the statistic column is? From a quick read of the nanomethviz docs, I couldn't tell. @Shians maybe you know?
Thank you for your reply, but in the package html, They mentioned,
We currently support output from Nanopolish f5c Megalodon
the format output from nanopolish (mentioned below), Then this output format can be converted to tabix indexed bgzipped format by create_tabix_file function in the methviz package, but unfortunately the modkit bedmethyl file is different, my question is how to convert the bedmethyl from modkit to tabix or to be like nanopolish output
chromosome strand start end read_name
1 chr1 - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb 2 chr11 - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478 3 chr11 + 69150806 69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044 4 chr1 + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9 5 chrY - 4082060 4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6 6 chr8 + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
1 -5.91 -100.38 -94.47 1 2 -8.07 -115.21 -107.13 1 3 -1.65 -183.12 -181.47 1 4 2.74 -112.14 -114.88 1 5 -1.78 -135.09 -133.32 1 6 5.02 -129.31 -134.33 1
num_motifs sequence
1 1 CATTACGTTTC 2 1 AACTTCGTTGA 3 2 GGTCACGGGAATCCGGTTC 4 1 AGAAGCGCTAA 5 1 CTCACCGTATA 6 1 TCTGACGTTGA
I recently tried to implement direct import of modkit bedmethyl. Looking at your data, I don't think it quite lines up with my expected columns. Could you let me know what command in modkit you used and what version?
https://github.com/Shians/NanoMethViz/commit/4f8181077feb2ca2b6362162643d802235fb8741
Repeated issue: https://github.com/Shians/NanoMethViz/issues/49
I recently tried to implement direct import of modkit bedmethyl. Looking at your data, I don't think it quite lines up with my expected columns. Could you let me know what command in modkit you used and what version?
Repeated issue: Shians/NanoMethViz#49
Thanks for your reply
here is the format for the betmethyl file generated from this command
modkit pileup path/to/reads.bam output/path/pileup.bed --cpg --ref path/to/reference.fasta
chr1 10468 10469 h 7 . 10468 10469 255,0,0 7 0.00 0 7 0 0 2 0 0 chr1 10468 10469 m 7 . 10468 10469 255,0,0 7 0.00 0 7 0 0 2 0 0
The column names for this file is mentioned here https://github.com/nanoporetech/modkit
NanoMethViz is intended to be used with read-level information, as such pile-up information isn't compatible since it aggregates read-level information to site-level information. If you instead run modkit extract full then I believe you should be able to directly import the data into Tabix format.
Thank you Shians for your reply again, I used the recommended command, I used The input bam file, and the output is tsv, but still it is different from nanopolish output Here is the file content
read_id forward_read_position ref_position chrom mod_strand ref_strand ref_mod_strand
fw_soft_clipped_start fw_soft_clipped_end read_length mod_qual mod_code base_qual ref_kmer query_kmer canonical_base modified_primary_base inferred flag
7ee32bc3-3bc2-4e05-8293-b478eae576c7 167 348147 chr1 + + + 38 13 912 0.15820313 h 11 . AGCGT C C false 0
@ralanany this output should be directly usable as input to create_tabix_file(), it does not need to match nanopolish, the latest version of NanoMethViz should be able to accept modkit extract full format as input. Please let me know if doesn't work in the latest GitHub or Bioc Devel version of NanoMethViz.
@Shians Hello, I'm new to data analysis. We've done some nanopore sequencing and I'd like to know if it's possible to use NanoMethViz with modkit.bed files and if so, how? I've tried but haven't succeeded...
Thanks for your answer