fetchngs icon indicating copy to clipboard operation
fetchngs copied to clipboard

Choice between fastq and fasterq-dump when using --force_sratools_download

Open dmalzl opened this issue 3 years ago • 2 comments

Description of feature

Hi it's me again,

After solving all the issues related to my ncbi config file I stumbled yet across another pitfall connected to parallel file system architectures when using fasterq-dump which is also a bit more detailed in this issue here. The core of it is that fasterq-dump with multiple threads will send respective compute nodes into a sort of hibernation from which they can't be woken unless one reboots the nodes. This already cased quite a portion of our cluster to sleep. A proposed solution by our IT was to simply don't use it as there is no clear threshold of used threads upon which this will occur. I updated my local copy of the pipeline accordingly and it seems to work fine. Since I think I won't be the only one who is deploying fetchngs on a compute cluster I think it will be a valuable augmentation of the pipeline to enable the user to choose between fasterq and fastq-dump in order to avoid breaking a cluster with parallel file system architectures.

dmalzl avatar May 09 '22 11:05 dmalzl

What if you set fasterq-dump to run single-threaded? Would that solve the issue? fasterq is basically deprecated so I wouldn't want to introduce it if it can be avoided.

Midnighter avatar May 09 '22 11:05 Midnighter

didn't dare to try it yet as I did not want to break the cluster again and since IT very strongly suggested to not use it anymore. In anyway they encouraged us to test it at some point on a free node. However, using it single-threadedly is equivalent to just using fastq-dump no? But I see it is much less of a hassle to simply run it on a single thread than implement a new process. I will try to get around testing it to see if it breaks the cluster again

dmalzl avatar May 09 '22 11:05 dmalzl

Hi @dmalzl ! Any updates on this issue?

drpatelh avatar Nov 04 '22 10:11 drpatelh

Hi @drpatelh After consulting with our IT and a bit digging it seems this is a problem related to multithreading on our filesystem which seems to not mix well due to incompatibilities between concurrent IO operations on a concurrent filesystem (I do not fully understand why this is). Anyway I resolved it by simply rewriting the module to use fastq-dump (as IT suggested) and this worked quite fine. However, I did not come around testing single threaded fasterq-dump yet but I guess it will be fine too

dmalzl avatar Nov 04 '22 10:11 dmalzl

Thanks for the update! Ok, will close this for now but feel free to re-open if the issue persists and if you have any suggestions for how this can be fixed in the pipeline.

drpatelh avatar Nov 04 '22 11:11 drpatelh