modkit icon indicating copy to clipboard operation
modkit copied to clipboard

Using Modkit with Transcriptome-Aligned BAM Files

Open lbwfff opened this issue 3 months ago • 1 comments

Hi,

I am working with modkit to analyze ONT direct RNA-seq data. I generated BAM files with modification tags aligned both to the genome and to the transcriptome. The reason for aligning to the transcriptome is that we would like to investigate RNA modifications at the transcript isoform level.

However, I noticed that the modkit pileup command runs extremely slowly on transcriptome-aligned BAM files. For example, when I tested the first sample, the analysis was still not finished even after more than 10 hours of running.

Here is the command I used:

/scratch/lb4489/project/dRNA/modkit/modkit pileup ./"$i"_GS2T_merged.bam ./"$i"_modkit.bed \
    --log-filepath "$i".log \
    --header --ref /scratch/lb4489/bioindex/gencode.v49.transcripts.fa \

Could you please let me know if there is anything wrong with the way I am running the command, or if there are adjustments I could make to improve the performance?

lbwfff avatar Sep 24 '25 05:09 lbwfff

Hello @lbwfff The command you have here is fine. I'm currently working on a much faster version of the pileup algorithm that should help a lot. One thing you may want to try in increasing the number of threads -t, you can oversubscribe your machine a little and be OK since the transcriptome has a lot of short sequences. Hold tight, faster version coming.

ArtRand avatar Sep 24 '25 23:09 ArtRand