Jonas Dedden
Jonas Dedden
/remove-lifecycle stale
Was sure you'd ask that :smile: I'm a bit unsure about that though. Changing the Scheduler address from the default address will likely lead to that Cluster being broken in...
@jacobtomlinson sorry for the delay, was busy with other things. PR should now be ready; I included some tests that will use the apparently previously unused "simpleworkergroup.yaml" for deploying an...
Ah, sure, sorry, will do that too tomorrow. This one doesn't need a test though (overriding the scheduler address is anyways a bit weird to test).
Ah, so what takes time here is asking the scheduler what the current memory situation across all workers is. This also happens on the next biggest call, taking around 7.0%...
Other tangential question: Can the tick rate of the dashboard be adjusted somehow? I would be totally fine with an update every 10s or so, I'm mostly using internal Grafana...
There is `--no-show` for entirely disabling the dashboard, but this parameter is apparently unused :( https://github.com/dask/distributed/blob/782050a3a4cf2abd450caa8adfaa912c22829e78/distributed/cli/dask_scheduler.py#L127 Using `--no-dashboard` will disable metrics too, which is undesirable :/ We'd want prometheus metrics...
>This is not currently exposed. It's also a bit messy since the update interval for every widget is controlled individually Hm, would be nice if that could be controlled somehow,...
Hey @fjetter ! We tried a bunch of even larger scale workloads (>3k workers), tried to optimize our stack a bit by not submitting too many tasks at once, disabled...
This is an alternative way to the PR https://github.com/kubernetes-sigs/aws-fsx-csi-driver/pull/383. I personally don't care about a certain approach, it's just that currently, this driver unfortunately blocks us from creating large (>~60-70TiB)...