manta
manta copied to clipboard
Running Manta with WGS data takes a lot of time
Hi,
The task took three days to run WGS data(30X,illumina) , please check the log for details.
Is there any way to improve it?
Hi,
I am having the exact same problem with a 30x WGS illumina sample. I tried both with v1.6 and 1.5 but the runtime progress is equally slow on a ubuntu machine with 8 cores and 48GB RAM.
If you have a solution for this problem (some key libraries or version) would be appreciated.
You can refer to this issue. https://github.com/Illumina/manta/issues/130 Setting the option enableRemoteReadRetrievalForInsertionsInGermlineCallingModes to 0 would run faster,but recall rate of INS would go down.
Hi, Thanks a lot for the infoI edited configManta.py.ini and tried it and it actually completed in a much shorter time. Do you have an estimation of the loss of INS recall by any chance or do you have a hint at why this may happen in some samples ? Finally, do you know if I can change this parameter from commandline rather than editing the default .ini file ?
Thanks again!
I can also confirm that setting the option enableRemoteReadRetrievalForInsertionsInGermlineCallingModes to 0 made a drastic improvement to the runtime of Manta some germline WGS ~40x samples using GRCh38 as a reference.
In a study with ~50 blood-derived WGS ~40x samples, about 45 samples completed in Manta in about 30 minutes on average using CPU cores and 80GB RAM (I don't think that much RAM was ever used, but that is what was available). However for about 5 samples the runtime exceeded 10 hours, some taking more than a day to complete. Some I terminated after more than a day. I can't see any reason why this would be the case from the quality of the samples.
However, on those 5 samples that took a very time, changing the setting as above brought the runtime down to under 30 minutes each. This is a drastic performance improvement! As @mbosio85 said above it would be useful to quantify what likely effect this setting has on sensitivity of the calls. Looking at the raw number of SV calls in the output it does not seem to have made a significant difference compared to my other samples than ran in reasonable time.