how to use bedmethyl tobigwig..?
Hello developer!!
I tried to use bedmethyl tobigwig function for m6A site visualization in IGV, but I don't know the proper input file and information.
the input bedmethyl file is generated by modkit pileup without any option(the end of filename is .bed), and use gencode.v43.transcripts.fa.fai for --sizes option.
but It didn't work. what file should I use for bedmethyl tobigwig??
the one thing is that I generated the bedmethyl file with older version, so the transcript/gene decription(gene ID information) is tab-separted.
Code modkit bedmethyl tobigwig --sizes gencode.v43.transcripts.fa.fai --mod-codes a --negative-strand-values --nthreads 10 --log-filepath modkit/output/modkit_bigwig/B2_log modkit/output/m6A_barcode2.bed modkit/output/modkit_bigwig/B2_bigwig
Not developer but these code works for me
modkit bedmethyl tobigwig foo.bedmethyl --mod-codes h -t 40 -g sizes.genome foo.bigWig
where sizes.genome something like
chr1 248956422
chr2 242193529
chr3 198295559
chr4 190214555
chr5 181538259
chr6 170805979
Hello @Seongmin-Jang-1165,
I generally recommend making the bigWig file at the same time as performing pileup, there is an example in the documentation.
# you can run pileup and stream the output through the new command '-' means output to stdout
modkit pileup ${modbam} - \
# --mod-code a will make a track of 6mA
| tee >(modkit bedmethyl tobigwig - ${outdir}/subsample_pileup_a.bw --mod-code a -g ${sizes} --log ${outdir}/bm_a.log --negative-strand-values --suppress-progress) \
# you can use "bm" or "bedmethyl"
# --mod-code (or --mod-codes h,m) will combine the 5hmC and 5mC counts into a track
| tee >(modkit bm tobigwig - ${outdir}/subsample_pileup_hm.bw --mod-codes h,m -g ${sizes} --log ${outdir}/bm_hm.log --negative-strand-values --suppress-progress) \
# finally pipe out to the pileup
> ${pileup}
# you can use a file also, if you have conflicting bases (e.g. 6mA and 5mC) - you'll get an error.
modkit bm tobigwig ${pileup} ${outbw} --mod-code m,a -g ${sizes} --negative-strand-values
@ArtRand @KunFang93 Thanks for advices, both of you!! I'll try it. have a good day
@ArtRand Hello I tried wiht your advice but the error occurs... and the output data doesn't have any information
I use the code below
modkit pileup m6A_barcode1_dorado_aligned_sorted.bam -
| tee >(modkit bedmethyl tobigwig - /pileup_bw/B1_pileup_a_2.bw --mod-code a -g homo_sapiens_chrom_sizes.txt --log /pileup_bw/B1_bm_a_2.log --negative-strand-values --suppress-progress)
this is the genomesize data homo_sapiens_chrom_sizes.txt
this is the log B1_bm_a_2.log
this is the system message slurm-21756.txt
I also tried with KunFang93's code, but it still not working
I think it's because of the | seperated chromosome information.. then shoild I regenerate input bed file( m6A calling data)??
Hello @Seongmin-Jang-1165,
I think the problem is that you're switching references. If you aligned to a transcriptome (looks like you did?), you need to use that reference throughout the analysis. Can you use a chromosome sizes file (the .fai from the reference will work) that you aligned the reads to for th-g argument?
@ArtRand
I did it with the genome size data and transcriptome size data, both of them have same error...
the genome file is .txt, and trascriptome file is gencode.v43.transcripts.fa.fai
code for bigwig
/modkit pileup m6A_barcode1_dorado_aligned_sorted.bam -
| tee >(modkit bedmethyl tobigwig - pileup_bw/B1_pileup_a_2.bw --mod-code a -g gencode.v43.transcripts.fa.fai --log B1_bm_a_2.log --negative-strand-values --suppress-progress)
I made an input BAM like below,
dorado basecaller sup,m6A_DRACH barcode1.pod5 > DORADO_m6A_barcode1.bam
dorado aligner GRCh38.primary_assembly.transcriptome.mmi DORADO_m6A_barcode1.bam > m6A_barcode1_dorado_aligned.bam
samtools sort m6A_barcode1_dorado_aligned.bam -o m6A_barcode1_dorado_aligned_sorted.bam
samtools index -b m6A_barcode1_dorado_aligned_sorted.bam
(generating transcriptome.mmi)
minimap2 -d GRCh38.primary_assembly.transcriptome.mmi /Reference/gencode.v43.transcripts.fa
is there some mistakes for generation input file..?
@Seongmin-Jang-1165 Sorry I think I missed this detail in your first comment:
the one thing is that I generated the bedmethyl file with older version, so the transcript/gene decription(gene ID information) is tab-separted.
Could you paste a few lines of the bedMethyl file you get out?
@ArtRand
sure! like below
ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 913 914 a 1 + 913 914 255,0,0 1 100.00 1 0 0 0 0 0 0 ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 1013 1014 a 1 + 1013 1014 255,0,0 1 0.00 0 1 0 0 0 0 0 ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 1021 1022 a 1 + 1021 1022 255,0,0 1 0.00 0 1 0 0 0 0 0 ENST00000378322.7|ENSG00000116213.16|OTTHUMG00000000612.6|OTTHUMT00000001471.1|WRAP73-202|WRAP73|1998|protein_coding| 1041 1042 a 1 + 1041 1042 255,0,0 1 0.00 0 1 0 0 0 0 0