gatk-sv icon indicating copy to clipboard operation
gatk-sv copied to clipboard

Job Long Running -- Module : GatherBatchEvidence Job : BAFFromGVCFs_ImportGVCFs

Open spatel-gfb opened this issue 2 years ago • 4 comments

Module : GatherBatchEvidence Job : BAFFromGVCFs_ImportGVCFs

Issue: The job is taking almost 40 hours to complete for 156 (1000 genome) samples.

The script and logs for the process are attached. Job_Logs.csv import_vcf_script.txt

We are running this pipeline on AWS (it was working fine earlier and the below durations for earlier run are also from AWS itself). The process is trying to import files inside the local genomicsdb . It runs in batches of 50 samples per batch. Each batch takes approximately 8 hrs to complete which is a lot of time keeping into mind all the required files are locally available and are not being brought in from s3.

Initially the job used to run for approx 19 hours which included localizing the s3 files which took almost 16 hrs and actual execution approx 3 hrs. But now as the localization steps is not required, the job should ideally complete within 4 hrs but its running longer.

The vcpus and memory requirement of this job has not changed since and its still running inside a container with 2 CPUs and 8 GB of memory.

This is just single instance and we are running almost 406 instances of this job on different chr positions and all of them are running longer.

We need your suggestions on this issue and why would it take this long, any changes were made which might do this?

Let us know if you need any further information.

spatel-gfb avatar Jan 17 '22 18:01 spatel-gfb