modkit icon indicating copy to clipboard operation
modkit copied to clipboard

how to use bedmethyl tobigwig..?

Open Seongmin-Jang-1165 opened this issue 9 months ago • 8 comments

Hello developer!!

I tried to use bedmethyl tobigwig function for m6A site visualization in IGV, but I don't know the proper input file and information.

the input bedmethyl file is generated by modkit pileup without any option(the end of filename is .bed), and use gencode.v43.transcripts.fa.fai for --sizes option.

but It didn't work. what file should I use for bedmethyl tobigwig??

the one thing is that I generated the bedmethyl file with older version, so the transcript/gene decription(gene ID information) is tab-separted.

Code modkit bedmethyl tobigwig --sizes gencode.v43.transcripts.fa.fai --mod-codes a --negative-strand-values --nthreads 10 --log-filepath modkit/output/modkit_bigwig/B2_log modkit/output/m6A_barcode2.bed modkit/output/modkit_bigwig/B2_bigwig

Seongmin-Jang-1165 avatar Mar 15 '25 00:03 Seongmin-Jang-1165

Not developer but these code works for me

modkit bedmethyl tobigwig foo.bedmethyl --mod-codes h -t 40 -g sizes.genome foo.bigWig

where sizes.genome something like

chr1    248956422
chr2    242193529
chr3    198295559
chr4    190214555
chr5    181538259
chr6    170805979

KunFang93 avatar Mar 15 '25 21:03 KunFang93

Hello @Seongmin-Jang-1165,

I generally recommend making the bigWig file at the same time as performing pileup, there is an example in the documentation.

# you can run pileup and stream the output through the new command '-' means output to stdout
modkit pileup ${modbam} - \
  # --mod-code a will make a track of 6mA
  | tee >(modkit bedmethyl tobigwig - ${outdir}/subsample_pileup_a.bw --mod-code a -g ${sizes} --log ${outdir}/bm_a.log --negative-strand-values --suppress-progress) \
  # you can use "bm" or "bedmethyl"
  # --mod-code (or --mod-codes h,m) will combine the 5hmC and 5mC counts into a track
  | tee >(modkit bm tobigwig - ${outdir}/subsample_pileup_hm.bw --mod-codes h,m -g ${sizes} --log ${outdir}/bm_hm.log --negative-strand-values --suppress-progress) \
  # finally pipe out to the pileup
  > ${pileup}

# you can use a file also, if you have conflicting bases (e.g. 6mA and 5mC) - you'll get an error.
modkit bm tobigwig ${pileup} ${outbw} --mod-code m,a -g ${sizes} --negative-strand-values

ArtRand avatar Mar 17 '25 15:03 ArtRand

@ArtRand @KunFang93 Thanks for advices, both of you!! I'll try it. have a good day

Seongmin-Jang-1165 avatar Mar 17 '25 18:03 Seongmin-Jang-1165

@ArtRand Hello I tried wiht your advice but the error occurs... and the output data doesn't have any information

I use the code below

modkit pileup m6A_barcode1_dorado_aligned_sorted.bam -
| tee >(modkit bedmethyl tobigwig - /pileup_bw/B1_pileup_a_2.bw --mod-code a -g homo_sapiens_chrom_sizes.txt --log /pileup_bw/B1_bm_a_2.log --negative-strand-values --suppress-progress)

this is the genomesize data homo_sapiens_chrom_sizes.txt

this is the log B1_bm_a_2.log

this is the system message slurm-21756.txt

I also tried with KunFang93's code, but it still not working

slurm-21755.txt

I think it's because of the | seperated chromosome information.. then shoild I regenerate input bed file( m6A calling data)??

Seongmin-Jang-1165 avatar Mar 17 '25 23:03 Seongmin-Jang-1165

Hello @Seongmin-Jang-1165,

I think the problem is that you're switching references. If you aligned to a transcriptome (looks like you did?), you need to use that reference throughout the analysis. Can you use a chromosome sizes file (the .fai from the reference will work) that you aligned the reads to for th-g argument?

ArtRand avatar Mar 18 '25 14:03 ArtRand

@ArtRand

I did it with the genome size data and transcriptome size data, both of them have same error...

the genome file is .txt, and trascriptome file is gencode.v43.transcripts.fa.fai

code for bigwig

/modkit pileup m6A_barcode1_dorado_aligned_sorted.bam -
| tee >(modkit bedmethyl tobigwig - pileup_bw/B1_pileup_a_2.bw --mod-code a -g gencode.v43.transcripts.fa.fai --log B1_bm_a_2.log --negative-strand-values --suppress-progress)

I made an input BAM like below,

dorado basecaller sup,m6A_DRACH barcode1.pod5 > DORADO_m6A_barcode1.bam

dorado aligner GRCh38.primary_assembly.transcriptome.mmi DORADO_m6A_barcode1.bam > m6A_barcode1_dorado_aligned.bam

samtools sort m6A_barcode1_dorado_aligned.bam -o m6A_barcode1_dorado_aligned_sorted.bam

samtools index -b m6A_barcode1_dorado_aligned_sorted.bam

(generating transcriptome.mmi)

minimap2 -d GRCh38.primary_assembly.transcriptome.mmi /Reference/gencode.v43.transcripts.fa

is there some mistakes for generation input file..?

Seongmin-Jang-1165 avatar Mar 19 '25 22:03 Seongmin-Jang-1165

@Seongmin-Jang-1165 Sorry I think I missed this detail in your first comment:

the one thing is that I generated the bedmethyl file with older version, so the transcript/gene decription(gene ID information) is tab-separted.

Could you paste a few lines of the bedMethyl file you get out?

ArtRand avatar Mar 19 '25 23:03 ArtRand

@ArtRand

sure! like below

ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 913 914 a 1 + 913 914 255,0,0 1 100.00 1 0 0 0 0 0 0 ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 1013 1014 a 1 + 1013 1014 255,0,0 1 0.00 0 1 0 0 0 0 0 ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 1021 1022 a 1 + 1021 1022 255,0,0 1 0.00 0 1 0 0 0 0 0 ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 1041 1042 a 1 + 1041 1042 255,0,0 1 0.00 0 1 0 0 0 0 0

Seongmin-Jang-1165 avatar Mar 20 '25 05:03 Seongmin-Jang-1165