velvet
velvet copied to clipboard
velveth results depending on `OMP_NUM_THREADS` and stalling if `OMP_THREAD_LIMIT=1`
I'm currently working on the velvet Galaxy wrappers: https://github.com/galaxyproject/tools-iuc/pull/4641
We recently changed our CI to use 2 instead of 1 core. So far we only used OMP_NUM_THREADS
which produced different results after the change to 2 cores. More precisely the Roadmaps
file changed.
When setting OMP_THREAD_LIMIT
to the same value as OMP_NUM_THREADS
(as suggested here https://www.biostars.org/p/86907/) the results are again the same, but velveth
stalls if the value is set to 1.
Could you tell us if/how we can resolve this issue?
We need to set a hard limit on the number of used threads, since many HPC systems do not allow over utilization of CPUs.
Hello @bernt-matthias ,
Thanks for raising this issue. If I recall correctly (it's been 10 years after all!) the multithreaded version is not 100% deterministic, in that the threads each go at their own speed, and mix up the ordering of the reads.
Do you know at all when velveth stalls when the value is set to 1?
Cheers,
Daniel
If I recall correctly (it's been 10 years after all!) the multithreaded version is not 100% deterministic, in that the threads each go at their own speed, and mix up the ordering of the reads.
This would have been also my guess. Is it also to be expected that the number of lines may change in the Roadmaps file?
Do you know at all when velveth stalls when the value is set to 1?
What I see in the terminal output is
[0.000000] Reading file '/tmp/tmpo8msn9gr/files/5/d/b/dataset_5db8d418-3d73-430a-8cb0-163736bbbdb9.dat' using 'Raw read' as FastQ
[0.000025] Reading file '/tmp/tmpo8msn9gr/files/5/d/b/dataset_5db8d418-3d73-430a-8cb0-163736bbbdb9.dat' using 'Raw read' as FastQ
[0.017763] 2496 sequences found in total in the paired sequence files
[0.017773] Done
[0.227818] Reading read set file /tmp/tmpo8msn9gr/job_working_directory/000/2/working/dataset_de783642-f122-4106-b123-ccc6ad6a8299_files/Sequences;
[0.228668] 2496 sequences found
[0.232689] Done
[0.232709] 2496 sequences in total.
[0.232757] Writing into roadmap file /tmp/tmpo8msn9gr/job_working_directory/000/2/working/dataset_de783642-f122-4106-b123-ccc6ad6a8299_files/Roadmaps...
[0.238615] Inputting sequences...
Then there is 100% CPU load but no progress (if using 2 or more threads then velveth finishes in seconds)
Hello @bernt-matthias
Apologies for the slow response (summer leave).
In short, the parallelisation which you stalled on is described here: https://github.com/dzerbino/velvet/blob/master/src/splayTable.c#L1237
In it, you can see two OMP parallel sections:
- One with one thread for writing into the outfile
- Another with an arbitrary number of threads reading the inputs.
In effect, it explicitly requires two threads.
If you absolutely need Velvet to run on 1 thread, then you can simply turn off multithreading entirely, by using
make ’OPENMP=0’
Hope this helps,
Daniel