Fabio M. Graetz, Ph.D.
Fabio M. Graetz, Ph.D.
First of all, thanks a lot for the quick reply @Huang-Wei, much appreciated 🙏 > May I know which coscheduler version you're running? We currently use 0.26.7 for both the...
We already included a `podGroupBackoffSeconds` in the config a few weeks ago and *believe to have observed that it improved the situation in the sense that in most situations, the...
Thanks a lot again @Huang-Wei for looking into this issue so quickly 🙏 I deployed an image with your fix in our prod cluster and will observe over the next...
Reporting first observations: The log line > "To-activate pod does not exist in unschedulablePods or backoffQ" was logged only 8 times in the last 12 hours so your PR seems...
Hey @Huang-Wei, we haven't observed any further hangings of the scheduler in our training cluster and the two weeks leading up to the winter break were very busy. I'm optimistic...
> @fg91 good to hear that! > > I will create an equivalent fix of #684 to `master` soon. BTW: are you going to stay in v0.26 for a while,...
> It's on my radar. Postponed a bit due to my personal bandwidth. I will get new release cut by end of this week. Happy to report that we didn't...
@eapolinario from my side this issue could be closed but I saw that you reopened it on Nov 2, 23. Ok to close?
> @fg91 is there an Error message available in terminal where `aim up` is running? Sharing the error and stack trace would help a lot. Yes, sure, after clicking the...
Unfortunately this issue isn't closed yet as the `task_config` override is still broken: ```py @task( task_config=Elastic(nnodes=1, nproc_per_node=2) ) def foo(): ... @dynamic def subwf(): foo().with_overrides(task_config=Elastic(nnodes=2, nproc_per_node=2)) @workflow def wf(): subwf()...