manta icon indicating copy to clipboard operation
manta copied to clipboard

Running Manta with WGS data takes a lot of time

Open zhaoxia-genedock opened this issue 4 years ago • 4 comments

Hi,

Manta_stdout.log

The task took three days to run WGS data(30X,illumina) , please check the log for details.

Is there any way to improve it?

zhaoxia-genedock avatar Feb 21 '20 08:02 zhaoxia-genedock

Hi,

I am having the exact same problem with a 30x WGS illumina sample. I tried both with v1.6 and 1.5 but the runtime progress is equally slow on a ubuntu machine with 8 cores and 48GB RAM.

If you have a solution for this problem (some key libraries or version) would be appreciated.

mbosio85 avatar Mar 06 '20 08:03 mbosio85

You can refer to this issue. https://github.com/Illumina/manta/issues/130 Setting the option enableRemoteReadRetrievalForInsertionsInGermlineCallingModes to 0 would run faster,but recall rate of INS would go down.

zhaoxia-genedock avatar Mar 06 '20 09:03 zhaoxia-genedock

Hi, Thanks a lot for the infoI edited configManta.py.ini and tried it and it actually completed in a much shorter time. Do you have an estimation of the loss of INS recall by any chance or do you have a hint at why this may happen in some samples ? Finally, do you know if I can change this parameter from commandline rather than editing the default .ini file ?

Thanks again!

mbosio85 avatar Mar 06 '20 15:03 mbosio85

I can also confirm that setting the option enableRemoteReadRetrievalForInsertionsInGermlineCallingModes to 0 made a drastic improvement to the runtime of Manta some germline WGS ~40x samples using GRCh38 as a reference.

In a study with ~50 blood-derived WGS ~40x samples, about 45 samples completed in Manta in about 30 minutes on average using CPU cores and 80GB RAM (I don't think that much RAM was ever used, but that is what was available). However for about 5 samples the runtime exceeded 10 hours, some taking more than a day to complete. Some I terminated after more than a day. I can't see any reason why this would be the case from the quality of the samples.

However, on those 5 samples that took a very time, changing the setting as above brought the runtime down to under 30 minutes each. This is a drastic performance improvement! As @mbosio85 said above it would be useful to quantify what likely effect this setting has on sensitivity of the calls. Looking at the raw number of SV calls in the output it does not seem to have made a significant difference compared to my other samples than ran in reasonable time.

bjpop avatar May 19 '21 05:05 bjpop