Albert Zeyer

Results 938 comments of Albert Zeyer

> However, the speedup is much less now. Need to do some benchmarks. I think it should still be faster than before though. Actually, no? It seems slightly slower. Maybe...

> > However, the speedup is much less now. Need to do some benchmarks. I think it should still be faster than before though. > > Actually, no? It seems...

Also, I think the current change in this PR here does not fully captures all the recursive calls of `sis_hash_helper`. The recursive calls are often through `obj._sis_hash` calls.

> I wonder how in your case Sisyphus imports end up in your RETURNN config. While there can be model code imported in the manager, I have never seen pipeline...

Offtopic mumbling: I consider RETURNN, i6_core, RASR, i6_experiments (at least the parts I developed and/or use in my setups), Sisyphus as stable, so that it should never be a problem...

What about my initial suggestion? > However, a simple thing we can do is maybe checking `sys.modules['__main__'].__file__` if that is actually some Sisyphus executable or not (like RETURNN) and in...

[sis-hang.txt](https://github.com/rwth-i6/sisyphus/files/13829192/sis-hang.txt)

Ah, I just see that all calls to `for_all_nodes` share a common thread pool. So if this thread pool queue is full via the `JobCleaner`, that would explain it, right?

Some possible suggestions: * The `JobCleaner` has this code: ```python while not self.stopped: self.sis_graph.for_all_nodes(f) time.sleep(gs.JOB_CLEANER_INTERVAL) ``` I think we maybe could swap the `sleep` with the `for_all_nodes`? * Can we...

The hang sometimes can also be longer, here 8 minutes: ``` [2024-01-04 11:31:49,989] INFO: error(8) queue(4) runnable(1) running(12) waiting(1484) Clear jobs in error state? [y/N] Print verbose overview (v), update...