distributed
distributed copied to clipboard
[WIP] Use of new dask CLI : job submission
Implementation of dask job submit. This is intended to consume a python script as:
dask job submit <cluster-name> <script-path>
There are potentially two different usage patterns we are anticipating:
- the script features use of a
daskcollection or buildsdelayedobjects and calls<object>.compute(); this should use thedaskcluster to execute all.computecalls. - the script features no
dask-isms at all, but is intended to be a single worker job as if the script were just a function submitted viaclient.submit.
It may make sense for these two patterns to be served by two different subcommands. Exploring that here to settle on an approach.
- [ ] Tests added / passed
- [ ] Passes
pre-commit run --all-files
Can one of the admins verify this patch?
Unit Test Results
See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.
15 files ±0 15 suites ±0 6h 7m 17s :stopwatch: - 18m 37s 2 977 tests ±0 2 888 :heavy_check_mark: +1 87 :zzz: - 1 2 :x: ±0 22 072 runs ±0 21 035 :heavy_check_mark: +1 1 035 :zzz: - 1 2 :x: ±0
For more details on these failures, see this check.
Results for commit b854dc98. ± Comparison against base commit 930d3dc0.
I think that we need to think a little bit about how this would be used. When submitting a script how does that script get access to the scheduler? Naively I might think that the following should work:
from dask.distributed import Client
client = Client() # connects to the current Dask scheduler
df = dask.dataframe.read_parquet(...)
df....
Great. If this is the kind of workflow that we're thinking of then we'll need to address a few issues:
- Make sure that we have the right address of the scheduler, and that we can connect to it reliably (for example, what happens if the scheduler is under TLS security?)
- Make sure that our script can be a normal blocking script and not block the event loop (
run_on_schedulermight not work?)
If this isn't the kind of workflow that we're thinking of then we should figure out what that workflow is and make sure that it works well.
Like @mrocklin said, the cluster discovery seems like a major question here (and something that I believe has been out of scope for the core dask project up until now). This seems closely related to dask-ctl https://github.com/dask-contrib/dask-ctl, cc @jacobtomlinson. Maybe a job subcommand would make more sense as a part of dask-ctl than core dask?
@gjoseph92 this work was done at the SciPy sprints where @mrocklin, @jsignell, @jcrist, @charlesbluca and I made some longer-term plans about CLI in Dask generally. I think the plans is to migrate some of the core functionality from dask-ctl up to distributed at some point and merge it into the existing CLI tooling in a more consistent way.
@douglasdavis started some of this work to move us towards an extensible dask CLI command. @dotsdl picked up this idea around submitting scripts and ran with it.
I have an action from the sprints to write this topic up in design-docs.
Haven't had time to circle back to this folks. Apologies for the delay, and thank you for the feedback!