distributed
distributed copied to clipboard
Add scheduler-sni option for dask workers
This adds support for dask workers to specify the SNI host separately from the TLS endpoint. In some environments (such as a Kubernetes cluster running dask-gateway), the schedulers may be behind a TLS proxy, while the workers are external to the cluster. This allows workers to connect to the schedulers without the need for wildcard DNS.
Signed-off-by: Burt Holzman [email protected]
Closes #6289
- [x] Tests added / passed
- [x] Passes
pre-commit run --all-files
Can one of the admins verify this patch?
Unit Test Results
16 files + 4 16 suites +4 7h 16m 54s :stopwatch: + 2h 5m 5s 2 768 tests + 6 2 690 :heavy_check_mark: + 20 78 :zzz: - 13 0 :x: - 1 22 106 runs +5 560 21 087 :heavy_check_mark: +5 341 1 019 :zzz: +225 0 :x: - 6
Results for commit 9da242fc. ± Comparison against base commit 70e5c908.
:recycle: This comment has been updated with latest results.
I realized I left off the websockets implementation; I'll squash my commit on that and force-push momentarily.
Thank you for the PR @holzman . I apologize for the delay in a response. @jacobtomlinson is this something that you or someone around you can review?
It's been a few weeks - before I rebase against HEAD, any issues with merging this?
I think we were waiting on @jcrist for input but we are probably fine to move ahead. This seems fine and if it is inspired by dask-gateway
then it's a reasonable change.
Please feel free to merge in from main
and resolve conflicts, we squash on merge anyway so don't stress about rebasing, squashing, force pushing, etc. It actually makes review harder when folks do that as we can't see the history of the PR.
Unit Test Results
See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.
15 files ± 0 15 suites ±0 6h 51m 44s :stopwatch: + 21m 4s 3 006 tests + 6 2 915 :heavy_check_mark: + 4 89 :zzz: ±0 2 :x: +2 22 288 runs +42 21 234 :heavy_check_mark: +39 1 052 :zzz: +1 2 :x: +2
For more details on these failures, see this check.
Results for commit 47d6dd62. ± Comparison against base commit 969aa463.
:recycle: This comment has been updated with latest results.
Hi @jacobtomlinson: it's been a few more weeks. Please let me know if there's anything else I need to do to help keep this moving.
@holzman apologies on the delay here. Maintainers here have been trying to dig out from some unpleasant deadlocking with the scheduler recently. I'm apprehensive to merge this change without input from the folks with deeper experience looking into that. @crusaderky and @florian-jetter-by are the best folks to merge this but are both out currently.
Perhaps @graingert or @gjoseph92 could also take a look and give a review here?
It's been another month since any activity on this. As always let me know if there's anything I can do to help move this along.
It's been another month since any activity on this. As always let me know if there's anything I can do to help move this along.
Ah sorry for dropping the ball on this
@graingert - did my comment https://github.com/dask/distributed/pull/6290#discussion_r946004112 make sense? Depending on the configuration, the communication may not be using TLS, so we can't call start_tls for every invocation, so we need two code paths here.
It's been another three weeks - checking in on this PR yet again.
Bump - it's been radio silence since mid-August. Please let me know what I can do to help move this forward.
@gjoseph92 or @crusaderky -- Can one of you pick this up?