spades icon indicating copy to clipboard operation
spades copied to clipboard

Everything looks ok, but spades stuck

Open xzhbio opened this issue 10 months ago • 13 comments

Description of bug

Everything appears to be running normally—logs show no errors or warnings, and system outputs look as expected. However, the program has been stuck for over 30 hours without making any progress.

I allocated 16 threads, but at the point where the process stalled, the CPU usage was only about 30% on a single thread, and it remained stuck indefinitely. Additionally, when I allocate too many threads, the program terminates due to insufficient memory allocation (OS return-value: 12). I have previously run tests on SPAdes, and it executed normally.

I would appreciate any insights or suggestions on what might be causing the stall. Thank you!

spades.log

spades.log

params.txt

params.txt

SPAdes version

SPAdes version: 4.0.0

Operating System

OS: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17

Python Version

Python version: 3.13.1

Method of SPAdes installation

conda

No errors reported in spades.log

  • [x] Yes

xzhbio avatar Feb 07 '25 08:02 xzhbio

I have more or less the same problem, but it gets stuck in the read error correction up to one week although log only shows about 1 hour run.

Jose-LSP avatar Feb 07 '25 12:02 Jose-LSP

Likely the problem is around I/O on your server if it stuck at this moment. Try moving temporary directory location from some network shared storage to local / scratch.

asl avatar Feb 07 '25 16:02 asl

Likely the problem is around I/O on your server if it stuck at this moment. Try moving temporary directory location from some network shared storage to local / scratch.

Thanks for the suggestion. My temporary directory is already located on local storage. Additionally, I noticed that when I downsample the data to 1GB (from the original 40GB), the assembly completes without any issues. However, when I downsample to 4GB, the k-mer counting step gets stuck and the process halts.

xzhbio avatar Feb 08 '25 09:02 xzhbio

I have more or less the same problem, but it gets stuck in the read error correction up to one week although log only shows about 1 hour run.

Hi, I noticed that even though the process has been stuck for a week, you haven't terminated it. Does this indicate that it might still be running, albeit very slowly? I'm curious if you've managed to resolve this issue or if you have any further insights to share.

xzhbio avatar Feb 08 '25 10:02 xzhbio

Thanks for the suggestion. My temporary directory is already located on local storage.

It doesn't seem so. You spades.log reads:

Other parameters:
  Dir for temp files: /public/home/xzh/south/11-JRT-2/assembly_0.1/tmp

And indeed, there is no --tmp-dir option used.

asl avatar Feb 08 '25 18:02 asl

And indeed, there is no --tmp-dir option used.

The default tmp-dir is already set to the output directory. I have attempted to use the --tmp-dir option to specify a different temporary directory, but it still doesn't work as expected.

Additionally, the process stopped with the message "finished abnormally, OS return value: 12," despite having at least 1600GB of free memory available.

spades.log

spades.log

params.txt

params.txt

xzhbio avatar Feb 14 '25 02:02 xzhbio

The default tmp-dir is already set to the output directory.

Right. And if it on some kind of NFS shared storage, it could easily cause problems as these systems were not designed to handle big I/O

Additionally, the process stopped with the message "finished abnormally, OS return value: 12," despite having at least 1600GB of free memory available.

It doesn't seem so:

  Memory limit (in Gb): 250

So, the hard memory limit was set to 250 Gb (default) and you have not overrode it. As a result, when more RAM was required you received out of memory error per log:

  3:20:16.175    82G / 96G   ERROR   General                 (mmapped_reader.hpp        :  52)   mmap(2) failed. Reason: Cannot allocate memory. Error code: 12

asl avatar Feb 14 '25 02:02 asl

By the way, I was wondering if downsampling the data would improve the assembly results or make them worse.

xzhbio avatar Feb 14 '25 02:02 xzhbio

Right. And if it on some kind of NFS shared storage, it could easily cause problems as these systems were not designed to handle big I/O

The file system in use is ParaStor, a distributed file system. If it is ok?

So, the hard memory limit was set to 250 Gb (default) and you have not overrode it. As a result, when more RAM was required you received out of memory error per log:

I reset the memory limit but still got the same issue.

spades.log

spades.log

params.txt

params.txt

xzhbio avatar Feb 14 '25 07:02 xzhbio

The file system in use is ParaStor, a distributed file system. If it is ok?

You'd better ask your system administrator. We cannot know the specifics of every NAS solutions and its issues.

I reset the memory limit but still got the same issue.

You didn't:

  Threads: 1600
  Memory limit (in Gb): 250

Looks like you spawned 1600 threads instead. Please next time double check the options & log before submitting the issue. Please refer to SPAdes manual for the information about command line options: https://ablab.github.io/spades/running.html#advanced-options

asl avatar Feb 14 '25 08:02 asl

I just want to say that @xzhbio did an excellent job in setting up the description of the problem and providing information. We just used this as an example of a great way to submit a question to Github in a class. :)

mrmckain avatar Mar 20 '25 20:03 mrmckain

@mrmckain Just curious: have you also taught in class how to read manuals, logs and error messages? :)

asl avatar Mar 20 '25 20:03 asl

That's what today was! How to troubleshoot errors. Getting the information out of logs, using manuals, and how to read an error message. This is not always intuitive, especially for students/early career researchers. We explored searching for answers online, reading existing forums/google groups/Githubs for the same issues to see how they were addressed, and finally talking with developers. For this example, I pointed out that it is helpful to provide the logs/parameters and other specifications when asking a question.

mrmckain avatar Mar 20 '25 21:03 mrmckain