fetchngs
fetchngs copied to clipboard
Choice between fastq and fasterq-dump when using --force_sratools_download
Description of feature
Hi it's me again,
After solving all the issues related to my ncbi config file I stumbled yet across another pitfall connected to parallel file system architectures when using fasterq-dump which is also a bit more detailed in this issue here. The core of it is that fasterq-dump with multiple threads will send respective compute nodes into a sort of hibernation from which they can't be woken unless one reboots the nodes. This already cased quite a portion of our cluster to sleep. A proposed solution by our IT was to simply don't use it as there is no clear threshold of used threads upon which this will occur. I updated my local copy of the pipeline accordingly and it seems to work fine. Since I think I won't be the only one who is deploying fetchngs on a compute cluster I think it will be a valuable augmentation of the pipeline to enable the user to choose between fasterq and fastq-dump in order to avoid breaking a cluster with parallel file system architectures.
What if you set fasterq-dump to run single-threaded? Would that solve the issue? fasterq is basically deprecated so I wouldn't want to introduce it if it can be avoided.
didn't dare to try it yet as I did not want to break the cluster again and since IT very strongly suggested to not use it anymore. In anyway they encouraged us to test it at some point on a free node. However, using it single-threadedly is equivalent to just using fastq-dump no? But I see it is much less of a hassle to simply run it on a single thread than implement a new process. I will try to get around testing it to see if it breaks the cluster again
Hi @dmalzl ! Any updates on this issue?
Hi @drpatelh After consulting with our IT and a bit digging it seems this is a problem related to multithreading on our filesystem which seems to not mix well due to incompatibilities between concurrent IO operations on a concurrent filesystem (I do not fully understand why this is). Anyway I resolved it by simply rewriting the module to use fastq-dump (as IT suggested) and this worked quite fine. However, I did not come around testing single threaded fasterq-dump yet but I guess it will be fine too
Thanks for the update! Ok, will close this for now but feel free to re-open if the issue persists and if you have any suggestions for how this can be fixed in the pipeline.