Performance degradation after running snakemake for some time
Snakemake version
Snakemake 8.16.0
Describe the bug
When running snakemake for a certain period of time, the scheduler gradually becomes lazy. Job finalization, job selection and job starting will all become slower and slower.
Correspondingly, the NLWP in htop (which is the number of threads) of snakemake main process gradually increases, from ~100 to ~1000. CPU usage also increases. The number of opened /proc/****/stat in lsof output also increases.
This issue is observed in both snakemake 7 and the newest snakemake 8, and in both local executor and slurm executor. It seems that performance degrades faster when the number of files or jobs is large.
For possible reason of this issue, without much time to dig further for now, my current guess is some kind of performance degradation with concurrent.futures.ThreadPoolExecutor in the executor, or with asyncio event loop.
Logs
Minimal example
Additional context
I am observing the same symptoms - after about 20k jobs, Snakemake takes a minute to submit the next (small) set of jobs.
I am not seeing an increase in NLWP, however. What I'm seeing is lots of stat() and openat() calls on `.snakemake/metadata'. Always the same pattern: stat, openat, fstat, read, close, repeating 5 times for the same metadata file, then moving on.
I'm on 7.32.2 with this .
I am observing the same symptoms - after about 20k jobs, Snakemake takes a minute to submit the next (small) set of jobs.
I am not seeing an increase in NLWP, however. What I'm seeing is lots of
stat()andopenat()calls on `.snakemake/metadata'. Always the same pattern: stat, openat, fstat, read, close, repeating 5 times for the same metadata file, then moving on.I'm on 7.32.2 with this .
Maybe you can try removing .snakemake/metadata and use --drop-metadata when running snakemake. This improves performance a lot in my cases, though it will still degrade, just degrades slower.
To clarify: This is not about general performance issues with metadata or DAG build (tag = enhancement). This is about the severe degradation during run time. It can be fixed by restarting Snakemake (tag = bug).
Something is accumulating while Snakemake operates and eventually leads to <1 job/minute scheduled. There is a bug somewhere.
It appears that my use of --notemp for development purposes also had a large negative impact, possibly even causative. Without this, it seems to work acceptably again.
@laf070810 Are you using checkpoints in your pipeline? I've done some debugging with PDB. My issue is somewhere inside update_checkpoint_dependencies(). If you aren't, I should probably make a different ticket for my issue.
@laf070810 Are you using checkpoints in your pipeline? I've done some debugging with PDB. My issue is somewhere inside
update_checkpoint_dependencies(). If you aren't, I should probably make a different ticket for my issue.
I have use cases both with checkpoint and without checkpoint, and the performance degradation looks basically the same.
@johanneskoester I've been looking over the code in that area and used signal triggered yappi to see where it's eating through so many cycles at least on my end. A few questions:
- Any reason why the
dag.finish()process is run for each job individually? It can handle group jobs, so it should be able to bulk finish everything that completed in one round. That might save a little bit. - The cache is turned off after the launch phase, and there is a note that it mustn't be turned on again. When the rerun triggers are on, this means that the metadata file is even read several times for each file (that's why I was seeing 5 stats per file, one for each trigger type). Is there any reason not to, say, have a "with" block around the postprocess step that enables a temporary cache? To avoid redundant reads and stats on the same files over and over.
- There may be some deeper issue here since it is indeed degrading. It's just doing too much. It might be related to "collect" type jobs that pick up lots of results and get updated with every completed job.
CC @tamasgal - this may be a similar concern as your ticket #2969
I've managed to make some progress on hacking my way around this by ignoring some warnings inside of Snakemake:
a) I've added a try/finally around the dag.process() call that enables/disables the persistence and iocaches. That way, the metadata is at least read only once per pass, twice per scheduling round. The rationale here is that the state of mtimes, job metadata, etc changing during a DAG update run very likely wouldn't be good anyway.
This issue was marked as stale because it has been open for 6 months with no activity.