gatk icon indicating copy to clipboard operation
gatk copied to clipboard

BwaSpark parameter optimization

Open joshua-biocoder opened this issue 1 year ago • 0 comments

Hi,

I'm trying to validate the performance of BwaSpark (I'm running it locally). The input ubam file size is 5.1 GB. It takes 65 minutes for GATK's BwaSpark to complete which is exactly same as bwa-mem. Below is the command that I used to run BwaSpark. Is there any way to make BwaSpark run faster while running it locally or will the performance increase only while running on spark cluster? Please let me know if I had to modify or add any parameter.

Also, please let me know where can I find the complete list of --conf parameters for BwaSpark? (I couldn't find these options in gatk BwaSpark --help)

time gatk BwaSpark --bwa-mem-index-image GRCh37.fasta.img --spark-master local[*] --bam-partition-size 4000000 --conf 'spark.executor.num=5' --conf 'spark.executor.cores=16' --conf 'spark.executor.memory=15G' --conf 'spark.driver.memory=30G' --conf 'spark.dynamicAllocation.enabled=true' -I unmapped_input.bam -O output.bam -R GRCh37.fasta 2> Log_file.log

joshua-biocoder avatar Jun 26 '24 20:06 joshua-biocoder