How to use bambu in parellel on HPC clusters?
Hi, I used bambu on one sample. The alignment result(.bam) of this sample is about 34G. And I run bambu on HPC cluster. I set the cpu number to 20. Then run it. But it took 6 hours to finish the job. And the cpu efficiency is only 5%. That is , only 1 cpu was used by bambu. I am so confused. Can you help me? Thanks! Below is my R code:
#! /usr/bin/env Rscript
library(bambu)
args<- commandArgs(trailingOnly=TRUE)
rawreads<-args[1]
ref_anno<-args[2]
ref<-args[3]
core<-args[4]
output=args[5]
bambuAnnotations <- prepareAnnotations(ref_anno)
se<-bambu(reads=rawreads, annotations=bambuAnnotations, genome=ref, ncore=core, trackReads=TRUE)
writeBambuOutput(se,path=output)
Below is my slurm code (submit my R script to the HPC cluster):
#!/bin/bash
#SBATCH -J bambu
#SBATCH --partition=cpu
#SBATCH -n 20
#SBATCH --output=%j.out
#SBATCH --error=%j.err
module load miniconda3
source activate R_422
ref=~/reference/GRCh38.p13.genome.fa
anno=~/reference/gencode.v44.chr_patch_hapl_scaff.annotation.gtf
reads=my.bam
output=../bambu/
core=20
./bambu.R $reads $anno $ref $core $output
Hi, Sorry I didn't see this issue earlier. It shouldn't take 6 hours to run a 34G bam file. Could you check the class() of core as I think it might be being read in as a character and not an integer which might be interpreted as 1 core and not 20. Adding as.interger(core) should help I hope. Kind Regards, Andre Sim