Everything looks ok, but spades stuck
Description of bug
Everything appears to be running normally—logs show no errors or warnings, and system outputs look as expected. However, the program has been stuck for over 30 hours without making any progress.
I allocated 16 threads, but at the point where the process stalled, the CPU usage was only about 30% on a single thread, and it remained stuck indefinitely. Additionally, when I allocate too many threads, the program terminates due to insufficient memory allocation (OS return-value: 12). I have previously run tests on SPAdes, and it executed normally.
I would appreciate any insights or suggestions on what might be causing the stall. Thank you!
spades.log
params.txt
SPAdes version
SPAdes version: 4.0.0
Operating System
OS: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17
Python Version
Python version: 3.13.1
Method of SPAdes installation
conda
No errors reported in spades.log
- [x] Yes
I have more or less the same problem, but it gets stuck in the read error correction up to one week although log only shows about 1 hour run.
Likely the problem is around I/O on your server if it stuck at this moment. Try moving temporary directory location from some network shared storage to local / scratch.
Likely the problem is around I/O on your server if it stuck at this moment. Try moving temporary directory location from some network shared storage to local / scratch.
Thanks for the suggestion. My temporary directory is already located on local storage. Additionally, I noticed that when I downsample the data to 1GB (from the original 40GB), the assembly completes without any issues. However, when I downsample to 4GB, the k-mer counting step gets stuck and the process halts.
I have more or less the same problem, but it gets stuck in the read error correction up to one week although log only shows about 1 hour run.
Hi, I noticed that even though the process has been stuck for a week, you haven't terminated it. Does this indicate that it might still be running, albeit very slowly? I'm curious if you've managed to resolve this issue or if you have any further insights to share.
Thanks for the suggestion. My temporary directory is already located on local storage.
It doesn't seem so. You spades.log reads:
Other parameters:
Dir for temp files: /public/home/xzh/south/11-JRT-2/assembly_0.1/tmp
And indeed, there is no --tmp-dir option used.
And indeed, there is no
--tmp-diroption used.
The default tmp-dir is already set to the output directory. I have attempted to use the --tmp-dir option to specify a different temporary directory, but it still doesn't work as expected.
Additionally, the process stopped with the message "finished abnormally, OS return value: 12," despite having at least 1600GB of free memory available.
spades.log
params.txt
The default tmp-dir is already set to the output directory.
Right. And if it on some kind of NFS shared storage, it could easily cause problems as these systems were not designed to handle big I/O
Additionally, the process stopped with the message "finished abnormally, OS return value: 12," despite having at least 1600GB of free memory available.
It doesn't seem so:
Memory limit (in Gb): 250
So, the hard memory limit was set to 250 Gb (default) and you have not overrode it. As a result, when more RAM was required you received out of memory error per log:
3:20:16.175 82G / 96G ERROR General (mmapped_reader.hpp : 52) mmap(2) failed. Reason: Cannot allocate memory. Error code: 12
By the way, I was wondering if downsampling the data would improve the assembly results or make them worse.
Right. And if it on some kind of NFS shared storage, it could easily cause problems as these systems were not designed to handle big I/O
The file system in use is ParaStor, a distributed file system. If it is ok?
So, the hard memory limit was set to 250 Gb (default) and you have not overrode it. As a result, when more RAM was required you received out of memory error per log:
I reset the memory limit but still got the same issue.
spades.log
params.txt
The file system in use is ParaStor, a distributed file system. If it is ok?
You'd better ask your system administrator. We cannot know the specifics of every NAS solutions and its issues.
I reset the memory limit but still got the same issue.
You didn't:
Threads: 1600
Memory limit (in Gb): 250
Looks like you spawned 1600 threads instead. Please next time double check the options & log before submitting the issue. Please refer to SPAdes manual for the information about command line options: https://ablab.github.io/spades/running.html#advanced-options
I just want to say that @xzhbio did an excellent job in setting up the description of the problem and providing information. We just used this as an example of a great way to submit a question to Github in a class. :)
@mrmckain Just curious: have you also taught in class how to read manuals, logs and error messages? :)
That's what today was! How to troubleshoot errors. Getting the information out of logs, using manuals, and how to read an error message. This is not always intuitive, especially for students/early career researchers. We explored searching for answers online, reading existing forums/google groups/Githubs for the same issues to see how they were addressed, and finally talking with developers. For this example, I pointed out that it is helpful to provide the logs/parameters and other specifications when asking a question.