gatk
gatk copied to clipboard
BwaSpark parameter optimization
Hi,
I'm trying to validate the performance of BwaSpark (I'm running it locally). The input ubam file size is 5.1 GB. It takes 65 minutes for GATK's BwaSpark to complete which is exactly same as bwa-mem. Below is the command that I used to run BwaSpark. Is there any way to make BwaSpark run faster while running it locally or will the performance increase only while running on spark cluster? Please let me know if I had to modify or add any parameter.
Also, please let me know where can I find the complete list of --conf parameters for BwaSpark? (I couldn't find these options in gatk BwaSpark --help)
time gatk BwaSpark --bwa-mem-index-image GRCh37.fasta.img --spark-master local[*] --bam-partition-size 4000000 --conf 'spark.executor.num=5' --conf 'spark.executor.cores=16' --conf 'spark.executor.memory=15G' --conf 'spark.driver.memory=30G' --conf 'spark.dynamicAllocation.enabled=true' -I unmapped_input.bam -O output.bam -R GRCh37.fasta 2> Log_file.log