dask icon indicating copy to clipboard operation
dask copied to clipboard

Add new CLI that is extensible

Open douglasdavis opened this issue 3 years ago • 17 comments

New CLI discussed at SciPy sprint

This creates a new dask command line tool that can be augmented by external packages (e.g. distributed) with a [dask_cli] entry point. The new tool can be used with either

$ dask

or

$ python -m dask

Adding an entry point is pretty straightforward; we'll use distributed as an example:

In an external-to-dask module at the location: distributed.cli.dask_scheduler with a function called main we can add to pyproject.toml (PEP-621):

[project.entry-points."dask_cli"]
scheduler = "distributed.cli.dask_scheduler.main"

or an "old school" setup.py or setup.cfg

[options.entry_points]
dask_cli =
    scheduler = distributed.cli.dask_scheduler:main

The function needs to be a click command:

# in a distributed/distributed/cli/dask_scheduler.py file

@click.command(name="scheduler")
@click.option(...)
@click.option(...)
def main():
    ...

Since the scheduler = ... entry point under the dask_cli section is defined in the distributed package; upon import of dask.cli the run_cli function will discover the external scheduler command. This allows use of

$ dask scheduler

or

$ python -m dask scheduler

Related PRs in distributed:

  • https://github.com/dask/distributed/pull/6735
  • https://github.com/dask/distributed/pull/6738

douglasdavis avatar Jul 16 '22 16:07 douglasdavis

pkg_resources is part of setuptools (news to me!) and its use is "discouraged" in favor of importlib from the stdlib. (so developers can use whatever packaging library they prefer, not locked into setuptools).

douglasdavis avatar Jul 18 '22 15:07 douglasdavis

Hey @jrbourbeau! The motivation is a bit multidimensional. In general a group of Dask contributors that were at the SciPy sprint last week discussed improving CLI based on some feedback and a desire to bring CLI tooling under one extensible umbrella. @jacobtomlinson adds some detail on a PR in distributed here: https://github.com/dask/distributed/pull/6738#issuecomment-1188767573

douglasdavis avatar Jul 19 '22 14:07 douglasdavis

From my perspective this needs docs, and then we should drop anything that isn't implemented (info). I'm good to merge after that.

mrocklin avatar Jul 20 '22 15:07 mrocklin

Matthew Rocklin @.***> writes:

From my perspective this needs docs, and then we should drop anything that isn't implemented (info). I'm good to merge after that.

OK that also sounds good to me

douglasdavis avatar Jul 20 '22 15:07 douglasdavis

we should drop anything that isn't implemented (info).

Just for reference, info is actually being used as a group that can have command children. So far we are using it for

$ dask info versions

(which is calling dask.utils.show_versions). We can add more commands under the info umbrella, and so can other libraries/apps with

from dask.cli import info
import click

@click.command(name="fun")
def fun():
    print(42)
    
info.add_command(fun)

to get

$ dask info fun
42

douglasdavis avatar Jul 20 '22 16:07 douglasdavis

Do we want to make click a hard dependency?

I don't think we should. I'd be interested to explore how much work it would be to also support existing tools that just use argparse, is there some kind of type check we can do and somehow wrap things if they aren't click? It'd be nice for subprojects to choose whatever CLI tool they like, they might want to try typer for example.

pkg_resources is part of setuptools (news to me!) and its use is "discouraged" in favor of importlib from the stdlib. (so developers can use whatever packaging library they prefer, not locked into setuptools).

Ooh this is news to me too! We should definitely switch out pkg_resources in dask-ctl too.

jacobtomlinson avatar Jul 25 '22 09:07 jacobtomlinson

I'd be interested to explore how much work it would be to also support existing tools that just use argparse, is there some kind of type check we can do and somehow wrap things if they aren't click? It'd be nice for subprojects to choose whatever CLI tool they like, they might want to try typer for example.

Typer apps can be used in Click apps, but I haven't been able to find anything that connects argparse and Click. Since the existing tools already use Click, I'm inclined to keep that as the foundation. If we don't make Click a hard dependency, perhaps we can make $ dask usage without click just print a you need to pip or conda install click message.

douglasdavis avatar Jul 25 '22 20:07 douglasdavis

Typer apps can be used in Click apps

Awesome!

Since the existing tools already use Click

Not all of them do. Dask Yarn and Dask Gateway both have argparse CLI tools.

jacobtomlinson avatar Jul 26 '22 09:07 jacobtomlinson

We could also use Click and not make it a required dependency. If it wasn't installed we would raise an informative error message.

On Tue, Jul 26, 2022 at 4:25 AM Jacob Tomlinson @.***> wrote:

Typer apps can be used in Click apps https://typer.tiangolo.com/tutorial/using-click/#including-a-typer-app-in-a-click-app

Awesome!

Since the existing tools already use Click

Not all of them do. Dask Yarn and Dask Gateway both have argparse CLI tools.

— Reply to this email directly, view it on GitHub https://github.com/dask/dask/pull/9283#issuecomment-1195238733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTACNIZTS5V6PXHQIZTVV6VHVANCNFSM53YJXTGA . You are receiving this because you commented.Message ID: @.***>

mrocklin avatar Jul 26 '22 13:07 mrocklin

I think the challenge here is dispatching to non-Click modules.

jacobtomlinson avatar Jul 26 '22 13:07 jacobtomlinson

OK. I'll pull back then. Mostly what I wanted to communicate is "I think that we somewhat comfortable with common dependencies". If that's not the issue then please ignore.

On Tue, Jul 26, 2022 at 8:11 AM Jacob Tomlinson @.***> wrote:

I think the challenge here is dispatching to non-Click modules.

— Reply to this email directly, view it on GitHub https://github.com/dask/dask/pull/9283#issuecomment-1195465604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTE4CE5P6G4KD3K2UVLVV7PZJANCNFSM53YJXTGA . You are receiving this because you commented.Message ID: @.***>

mrocklin avatar Jul 26 '22 13:07 mrocklin

Yeah totally, thanks for the clarification. I think @jcrist was concerned that we would need to update existing CLI tools to work with this. I agree that however this is implemented should integrate with existing tools just by adding the entrypoint and not require converting everything to Click or something.

jacobtomlinson avatar Jul 26 '22 13:07 jacobtomlinson

Not all of them do. Dask Yarn and Dask Gateway both have argparse CLI tools.

Ah I did not realize more than what's in distributed exists, thanks for pointing that out! It looks like it would not be too difficult to convert the CLI in dask-yarn to Click. I wasn't able to find argparse use in dask-gateway. After more digging I'm still striking out at finding anything on click consuming existing argparse objects. I haven't tested this, but perhaps dask-yarn can create a dask yarn click command that just calls dask-yarn.

At this point the PR has the following behavior for two scenarios where click isn't installed. I'm also happy to remove these in favor of just making Click a dependency.

  1. Try to run dask:
(scratch) ddavis@charm ~/software/repos/dask (new-cli) $ dask
The dask command requires click to be installed.

Install with conda or pip:

 conda install click
 pip install click

  1. Try to import the dask.cli module (a situation where someone wants to expand the CLI):
>>> import dask.cli
Traceback (most recent call last):
  File "/Users/ddavis/software/repos/dask/dask/cli.py", line 7, in <module>
    import click
ModuleNotFoundError: No module named 'click'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ddavis/software/repos/dask/dask/cli.py", line 15, in <module>
    raise ImportError(msg) from e
ImportError: The dask.cli module requires click to be installed.

Install with conda or pip:

 conda install click
 pip install click

Of course if someone has used pip install dask[complete] or pip install dask[distributed] they won't see these errors.

douglasdavis avatar Jul 26 '22 14:07 douglasdavis

Yeah @jcrist voiced concerns about having to do that, I'm just echoing them here as he might not have time to engage here.

Gateway starts a Traitlets application from its CLI command, which uses argparse.

It would be really nice if we could support this without enforcing Click on those projects.

jacobtomlinson avatar Jul 26 '22 15:07 jacobtomlinson

I'm happy with the dask command requiring Click and the import errors seem totally fine, but maybe we can find a way to do the wrapping here instead of asking other projects to do it.

jacobtomlinson avatar Jul 26 '22 15:07 jacobtomlinson

Alright I think this is ready for review. The partner PR (dask/distributed#6735) should go in with this one, but it relies on installing my dask branch associated with this PR (I've changed all of the CI to use dask at version douglasdavis/dask@new-cli). I'm guessing this situation requires:

  1. This PR is approved and ready to go (don't merge but also don't make any more changes)
  2. Then review dask/distributed#6735
  3. Remove dask/distributed#6735 dependence on this PR's branch
  4. Merge both PRs together

douglasdavis avatar Aug 23 '22 15:08 douglasdavis

thanks @jacobtomlinson! Added a few commits to address those comments

douglasdavis avatar Aug 24 '22 13:08 douglasdavis