dagger 0.3 CLI design

Overview

As part of the 0.3 engine release (codename cloak), the dagger CLI needs a redesign to replace the cloak binary used during development.

Task list

This issue is considered complete when consensus is reached on the 0.3 CLI design, and documented in a way that allows unambiguous implementation.

[x] Kickstart proposal
[ ] Bikeshedding in comments
[x] Rough consensus
[ ] Polish
[x] Start implementation

Function

The Dagger CLI is a client tool for interacting with the Dagger engine. It can be called from the terminal, a shell script, or a CI runner.

It has (or will have) the following features:

User-friendly GraphQL client, comparable in features to gqurl
Graphql proxy for compatibility with native graphql tools (graphiql)
Auto-install the engine for zero-conf onboarding
Interactive tty attach (for live pipeline debugging)
Login to Dagger Cloud, and other interactions with Dagger Cloud
TBD: management of a dagger project, its pipelines and extensions (possibly by editing one or more config files)
TBD: support for multiple namespaces and engines (possibly by editing one or more config files)
TBD: other development tasks common to all Dagger SDKs

Architecture

graph LR

term["Terminal"]
script["Shell script"]
ci["CI runner"]
cli(("Dagger CLI"))

go["Go SDK"]
python["Pyton SDK"]
nodejs["NodeJS SDK"]

term & script & ci --> cli

engine["Dagger Engine"]
runtime["OCI container runtime"]
cli & go & python & nodejs -..-> engine --> runtime

Sub-commands

GraphQL client

Subcommand name: query. See options below

Usage:

$ dagger query --help
Send API queries to a dagger engine

Usage:
  Send API queries to a dagger engine

When no document file, read query from standard input.

Usage:
  dagger query [flags] [operation]

Flags:
      --debug         enable buildkit logs
      --doc string    document query file
  -h, --help          help for query
      --var strings   query variable

Examples:

dagger query <<EOF
{
  container {
    from(address:"hello-world") {
      exec(args:["/hello"]) {
        stdout {
          contents
        }
      }
    }
  }
}
EOF

Global Flags:
      --workdir string   The host workdir loaded into dagger (default ".")

API router Listen command

Subcommand name: listen. See options below

Starts the engine server

Usage:
  dagger listen [flags]

Flags:
      --disable-host-read-write   disable host read/write access
  -h, --help                      help for listen
      --listen string             Listen on network address ADDR (default ":8080")

Global Flags:
      --workdir string   The host workdir loaded into dagger (default ".")

Exec wrapper command

Subcommand name: exec. See options below

// In progress

Interactive attach

TBD

Login to Dagger Cloud

TBD

$ dagger login

$ dagger logout

Open questions

Should the dagger cmd live in the dagger/dagger repository? @shykes had some observations that it feels strange that the dagger CLI depends on the go SDK from the engine repository. In his head, it makes more sense if the CLI lives in the go-sdk repo instead of the engine one. @aluzzardi he told me he'll bring this to you for discussion.

Oct 12 '22 23:10 shykes

@shykes who owns this issue to ensure we get an agreement on the proposal or will that be you? Also, should I add this to the #3283 release checklist?

Oct 13 '22 03:10 mircubed

@shykes who owns this issue to ensure we get an agreement on the proposal or will that be you?

I can drive it, assigned it to myself to clarify.

Also, should I add this to the #3283 release checklist?

Yes I believe so 👍 cc @gerhard

Oct 13 '22 03:10 shykes

dagger gateway as described there shouldn't be a public function right? I don't see that users would ever need to call it.

Oct 13 '22 04:10 sipsma

dagger gateway as described there shouldn't be a public function right? I don't see that users would ever need to call it.

Yes I guess it could be hidden from the help message (though still documented somewhere for SDK contributors?) Is that what you mean?

Even if you’re never supposed to call it, I think it’s good that you can easily find information about what the hell that process is when you see it in ps. I also assumed it would show up in architecture diagrams, but actually it would onlyever appear in very, very detailed diagrams for extremely advanced contributors that want to ie. start a new SDK etc. So less important than Initially thought.

Oct 13 '22 05:10 shykes

Updated design proposal, more details on visibility and function.

This is in “good enough” territory, but we can do better. An idea is germinating, will update proposal tomorrow.

Oct 13 '22 07:10 shykes

To calibrate expectations, my understanding is that this will happen in parallel to the binary (a.k.a. engine) release work. While we would like to have this issue closed before 0.3.0-alpha.1 is produced, we should not approach it in sequence, i.e. close this issue first, then move onto the release process. As I mentioned elsewhere, I don't expect us to get the engine release process right with the first alpha release, and I think that we are likely to have a few before we are confident that everything works as expected.

While in the alpha stage, anything goes from an artefact perspective. Even if the binaries are different and incompatible, that is OK because the focus is on how all the various systems integrate in the release process:

GitHub tags & pre-releases
Homebrew tap
install.sh (this implies AWS S3 & CloudFront)
install.ps (we are deferring this towards the end)

Let me know if you think differently.

Oct 13 '22 07:10 gerhard

dagger engine & dagger gateway make sense as proposed.

I am wondering if dagger exec & dagger client could be a single command. I am leaning towards dagger exec since it's familiar & comfortable - the shell built-in exec command.

What do you think @shykes?

Oct 13 '22 07:10 gerhard

dagger engine & dagger gateway make sense as proposed.

I am wondering if dagger exec & dagger client could be a single command. I am leaning towards dagger exec since it's familiar & comfortable - the shell built-in exec command.

What do you think @shykes?

exec is very different, it’s the shim that will be run by buildkit at the very bottom of the stack. It is functionally a completely different command, just bundled in the same CLI purely for reasons of convenience of packaging.

Oct 13 '22 07:10 shykes

To calibrate expectations, my understanding is that this will happen in parallel to the binary (a.k.a. engine) release work. While we would like to have this issue closed before 0.3.0-alpha.1 is produced, we should not approach it in sequence, i.e. close this issue first, then move onto the release process. As I mentioned elsewhere, I don't expect us to get the engine release process right with the first alpha release, and I think that we are likely to have a few before we are confident that everything works as expected.

While in the alpha stage, anything goes from an artefact perspective. Even if the binaries are different and incompatible, that is OK because the focus is on how all the various systems integrate in the release process:

GitHub tags & pre-releases

Homebrew tap

install.sh (this implies AWS S3 & CloudFront)

install.ps (we are deferring this towards the end)

Let me know if you think differently.

Yes I agree. It can and should happen in parallel.

Oct 13 '22 07:10 shykes

exec is very different, it’s the shim that will be run by buildkit at the very bottom of the stack. It is functionally a completely different command, just bundled in the same CLI purely for reasons of convenience of packaging.

OK, I understand the purpose of exec better now - the updated table in the description helps!

It may not be a big deal, but I liked dagger engine from the story perspective and dagger do from the v0.2 familiarity perspective. Having said that, we should either pick noun verb (e.g. dagger do) or noun scope (e.g. dagger engine) and be consistent throughout. FWIW, the existing dagger version & dagger help makes me lean towards noun scope.

I would really like us to standardise on the dagger <SUBCOMMAND> pattern and always require a subcommand, regardless whether this is a noun or scope. It will make shell completion easier and it will help everyone navigate the domain better. FTR https://clig.dev/#subcommands

Oct 13 '22 07:10 gerhard

Reminder to self: when dagger dev gets replaced with dagger gateway (or any other subcommand for that matter), remember to follow-up on:

https://github.com/dagger/dagger/pull/3310
https://github.com/dagger/dagger/pull/3285

Oct 13 '22 09:10 gerhard

It may not be a big deal, but I liked dagger engine from the story perspective and dagger do from the v0.2 familiarity perspective. Having said that, we should either pick noun verb (e.g. dagger do) or noun scope (e.g. dagger engine) and be consistent throughout. FWIW, the existing dagger version & dagger help makes me lean towards noun scope.

I would really like us to standardise on the dagger <SUBCOMMAND> pattern and always require a subcommand, regardless whether this is a noun or scope. It will make shell completion easier and it will help everyone navigate the domain better. FTR https://clig.dev/#subcommands

Those are all valid points. I will post my updated proposal before addressing them, because it changes a lot of the things you mention.

Oct 13 '22 19:10 shykes

Update to present two very different design options:

Option 1: all-in-one binary (same as before)
Option 2: client only (feedback welcome)

FYI @gerhard @sipsma @kpenfound

Oct 13 '22 22:10 shykes

Given bikeshedding is welcome here, I am bringing some paint...

Option 1 makes more sense to me (with some changes): it's confusing to talk about a gateway or a worker. IMO they are part of infrastructure and implementation of a dagger cluster. It's simpler to think about the CLI as a client + engine (the shim is an implementation detail of the engine). It's similar to how users perceive the Docker engine.

Oct 13 '22 22:10 samalba

Given bikeshedding is welcome here, I am bringing some paint...

Option 1 makes more sense to me (with some changes): it's confusing to talk about a gateway or a worker.

That only leaves 50% of option 1 ;) What changes would you propose?

Oct 13 '22 23:10 shykes

IMO [gateway and worker] are part of infrastructure and implementation of a dagger cluster

Note that gateway is on the client side (client helper for SDKs)

Oct 13 '22 23:10 shykes

What are the advantages of Option 2? I guess it seems mostly arbitrary to me in terms of UX. If we go with Option 1, we still retain the option of also having an OCI image in the future (in which we'd just put the same binary that has everything) if we want to for whatever reason.

In terms of the concepts we explain, I guess in either option 1 or 2 we have to explain that there's a client and a worker, does it really make a difference whether they are bundled in the same binary or not? I'm asking genuinely, maybe that's something that creates confusion; my immediate biased impression is that it doesn't really make much of a difference.

The other factors are:

Engineering effort - I only see more work for Option 2 (could be missing something)
Packaging convenience - you mentioned something about Option 2 being preferable in terms of repelling repackaging/forking. I'm interested in more details there, I don't see at the moment how option 2 helps relative to option 1, but I truly have zero experience with any considerations around that sort of thing so I am ignorant.

Oct 13 '22 23:10 sipsma

What are the advantages of Option 2? I guess it seems mostly arbitrary to me in terms of UX.

First attempt at listing advantages (may not be exhaustive):

Less commands and flags in the CLI syntax: only what end users need
Easier to explain the architecture: client runs on the end user's machine; the engine runs on worker machines.
Smaller binary size. Always good, but made more valuable by also reducing SDK size
Less bundling gymnastics needed to package the engine and all its subcomponents. We are free to split it into individual binaries, or not, whatever makes our job easier. We can also change this over time, since individual binaries are considered private and not to be packaged separately.
Not distributing the engine as a single Linux binary makes it harder for Linux distros and their commercial backers to redistribute modified versions. This was a major source of fragmentation for Docker, and made the overall user experience worse (not to mention unnecessary drama)
It only makes sense to run the engine on OCI-capable systems. Standardizing on running the engine on an OCI runtime makes that requirement more explicit, and frees us of the burden of shipping an additional binary target that is redundant
Allows a gqurl-style client UI, which doesn't require subcommands. If we bundle several functions in the same binary, this creates pressure to follow CLI best practices and park the client in a dagger client sub-command (as proposed by @gerhard earlier in this thread) which would be cumbersome IMO

Oct 13 '22 23:10 shykes

cc @vito

Oct 13 '22 23:10 shykes

Less commands and flags in the CLI syntax: only what end users need

Allows a gqurl-style client UI, which doesn't require subcommands.

Smaller binary size. Always good, but made more valuable by also reducing SDK size

I see the points here. All else being equal, starting with as minimalistic cmds/flags as possible does give us the most long term flexibility.

Another thing I'm realizing is that the engineering effort difference between option 1 and 2 isn't necessarily very large in the immediate term. I was previously imagining that option 2 would mean we immediately need to start publishing and maintaining images in a registry, but I don't think it necessarily does (right?).

Even the first step @grouville took with embedding buildkit is actually only a tiny step away from option 2. The only difference is that he made the equivalent of dagger worker a hidden subcommand, whereas in option 2 it would be a separate main func. If we made that slight adjustment it actually fits what we're describing in option 2 pretty much exactly (it's just that the image is not pulled from a registry).

We of course can pull the image from a registry when we want to, it's just not an absolute hard requirement

So yeah, I'm actually okay with Option 2. I see the point about keeping the CLI interface minimal, and the costs of it relative to Option 1 are not really very large.

Oct 14 '22 01:10 sipsma

Some points:

When would I use the client? If it's just to send arbitrary GraphQL queries that seems like kind of an edge case in the grand scheme of things, and promoting that to something as important-sounding as the "dagger client" might mislead users into thinking it's the primary interface. I guess the Bash SDK would use this, but for everything else I'd expect to just use an SDK anyway. If it's really only the Bash SDK I wonder if it should literally just be the "Bash SDK"? :thinking: (Ok I guess it's also fish/zsh SDK. Shell SDK?)
dagger exec needs to correlate 1:1 with the worker, so I'd expect it to not be present on OS X, and for the client's local dagger exec to always be ignored. To be honest I wouldn't expect this command to be exposed to the user at all. Right now we build the shim from source anyway, but if we wanted to bundle the binary we could just go:embed it and stream it over. And if we do that, we'll need to make sure it's really small so it can fit in a single gRPC message. I ran into this with Bass and had to use upx to shrink the shim at some point. I guess alternatively we could unpack it to a LocalDir.
I think if we call it exec people will probably think it's used like docker exec and kubectl exec etc. I think shim would be clearer.
Distributing the worker as an OCI image allows us to have a 1:1 pairing of the worker + its shim for each architecture in a multi-architecture image, which feels really clean to me. But I think users will still want to be able to run it outside of an OCI runtime[^1], so I'd lean towards having an all-in-one binary for the worker even if we do this.
I know we talked about Concourse being a single binary sometime in the past (I forget which discussion), but I'll clarify that even Concourse had a separate client binary (fly) - it only bundled the different types of servers together (web and worker).

tl;dr I prefer option 2, but I question whether we even need a client, and think it'd still be nice to have an all-in-one worker command. :)

[^1]: e.g. to avoid nesting containers, which is as far as I can tell is totally fine but people seem to think there are ghosts there all the time

Oct 14 '22 01:10 vito

I just realized that we may want somewhere to put all the service tty debugger attachment stuff too (and other service socket types in the future). It may make sense to integrate it with the client command in that it works by first sending a graphql query for a websocket endpoint. Could be that if you send a query that selects a socket endpoint and provide an extra flag like --attach, it will attach you to it.

Many details to figure out there (how do you know what type of data you're attaching to), corner cases where you select more than one scalar, etc. etc. Just a rough idea

Oct 14 '22 01:10 sipsma

I have considered both options carefully and I think that Option 1 (all in one binary) would be the simplest one and best if paired with the following elements from Option 2:

make the worker command invisible 👻 - that would leave dagger as the only entrypoint
distribute the binary as an OCI too

The above approach doesn't close the door to Option 2 as a future refinement. As a follow-up, we could make the CLI smaller and the OCI more single-purpose. I imagine that having the engine locally as a hidden feature will be incredibly useful (especially for those that run on Linux). I also think that having the CLI baked into the OCI will be useful for debugging purposes.

Ephemeral containers are a great idea, but in practice they are not as convenient as having the tooling already in the running containers. Single binary might also give us the option of launching an "enterprise" version at some point in the future which is stripped down - effectively Option 2, but with more real-world feedback. The focus is on some point in the future.

For the next few weeks, I propose that we focus on the simplest thing. For me, that is the single binary distributed as an OCI too. ⛴

My favourite Concourse CI feature was the single binary approach which could be run in web or worker modes. The binary also included the flyctl client which could be used by older flyctl clients to update themselves to the version that matched the web API 🤯

OK, I know that technically most users ended up downloading and installing flyctl locally, but the concourse binary included a flyctl binary too. The right curl request to the web API is the installation method which I always wanted, but missed. Maybe dagger worker will have that 😉

Oct 14 '22 15:10 gerhard

I have considered both options carefully and I think that Option 1 (all in one binary) would be the simplest one and best if paired with the following elements from Option 2:

Care to elaborate in what ways it would be simpler? It would useful to have more detail to inform the final decision.

Oct 14 '22 16:10 shykes

The above approach doesn't close the door to Option 2 as a future refinement. As a follow-up, we could make the CLI smaller and the OCI more single-purpose.

I don't see how that would happen. The inertia and bikeshedding is already enormous before launching. Imagine after we have actual users and a mature packaging process in place, then we propose changing the definition of the CLI, its role in the architecture, break some commands. It would not be impossible to switch to option 2 later in theory, but it would be so painful that we should assume it will be impossible in practice.

I imagine that having the engine locally as a hidden feature will be incredibly useful (especially for those that run on Linux).

What would be incredibly useful about it?

I also think that having the CLI baked into the OCI will be useful for debugging purposes.

That I agree with, 100%.

Oct 14 '22 17:10 shykes

Some points:

When would I use the client? If it's just to send arbitrary GraphQL queries that seems like kind of an edge case in the grand scheme of things, and promoting that to something as important-sounding as the "dagger client" might mislead users into thinking it's the primary interface. I guess the Bash SDK would use this, but for everything else I'd expect to just use an SDK anyway. If it's really only the Bash SDK I wonder if it should literally just be the "Bash SDK"? 🤔 (Ok I guess it's also fish/zsh SDK. Shell SDK?)

I think instead of calling it "the client" we would just call it "the Dagger CLI". It would be the preferred way to interact with Dagger from a command-line environment (interactive or scripted). I think the Bash SDK, if we ever ship it, would be built on top of the CLI with additional bash-specific sugar.

In addition to sending arbitrary GraphQL queries, the CLI would also:

Auto-install engines using pluggable provisioners. Probably not something you'd use in production, but awesome for "zero-to-one" experience locally.
Other engine management commands. Rough equivalent of docker machine or buildx node management.
Namespacing / project management. Whatever solution we find to that problem, will probably hook into the CLI
Interactions with Dagger Cloud. Starting with login, but possibly others in the future
Dev tooling. We removed codegen, but it will come back... The CLI is the natural place to add it. There may be others in the future (hooks for testing maybe?)

An interesting comparison: Stripe presents "Stripe CLI" as an option in the API docs drop-down, alongside the language selection:

dagger exec needs to correlate 1:1 with the worker, so I'd expect it to not be present on OS X, and for the client's local dagger exec to always be ignored. To be honest I wouldn't expect this command to be exposed to the user at all. Right now we build from source anyway, but if we wanted to bundle the binary we could just go:embed it and stream it over. And if we do that, we'll need to make sure it's really small so it can fit in a single gRPC message. I ran into this with Bass and had to use upx to shrink the shim at some point. I guess alternatively we could unpack it to a LocalDir.

I think if we call it exec people will probably think it's used like docker exec and kubectl exec etc. I think shim would be clearer.

Yes I agree. dagger exec is confusing. That subcommand should either be renamed + hidden, or not present in the CLI at all.

Distributing the worker as an OCI image allows us to have a 1:1 pairing of the worker + its shim for each architecture in a multi-architecture image, which feels really clean to me. But I think users will still want to be able to run it outside of an OCI runtime1, so I'd lean towards having an all-in-one binary for the worker even if we do this.

Could we not make that an add-on option for later? Clearly it's not the primary use case, since today the huge majority of users run the "engine" (buildkit) in an OCI image, the alternative is barely documented and nobody seems to complain.

If we're going to eventually support both, it makes more sense to me to start with OCI as the mainstream, and later maybe add standalone binary as the niche option.

Oct 14 '22 17:10 shykes

OK, let me try an option 3 which would be a hybrid of 1 and 2 with some of the feedback here incorporated.

I think we all agree on choosing the simplest option (for us and for users) but disagreement on what is and isn't simple :)

Oct 14 '22 17:10 shykes

@gerhard @mircubed I'm afraid this is a blocker for the 0.3.0-alpha.1 release after all, since it affects what it is exactly that we will be releasing. We don't need to resolve 100% of this issue to unblock the release - but we need to resolve parts of it. Can you hold the release until we explicitly resolve that issue here?

Thanks

Oct 14 '22 21:10 shykes

In the interest of getting this moving in a direction, I don't have any other objection. I don't feel the need to expand on any of the points that I have made above.

I am perceiving a consensus forming around an OCI. Regardless what we put in it, our release process should be able to produce one. I will be adding this capability shortly.

As a starting point, the OCI will include the existing binary - go build ./dagger/cmd. I expect us to refine as necessary. The outcome of this issue will be important. I don't mind which way we decide is forward as long as we are moving again.

Oct 17 '22 15:10 gerhard

Additional items to be adressed:

How do we plan to handle CLI releases? cc @gerhard

My initial thought is that CLI SDLC should be able to evolve independently from the engine version while keeping API compatibility with the major version of the Engine. By this I'm suggesting that following the semver convention, the MAJOR version of the engine and the CLI should be kept in sync while the MINOR and PATCH versions could evolve independently.

Following Go SDK's approach should we use a separate repo and mirror the CLI code to handle releases? Particularly since using github releases in a single repo for both the engine and the CLI might become problematic.

@shykes now addressing your thoughts:

It has (or will have) the following features:

Ability to bootstrap projects in supported languages.
i.e: dagger project init --language go. Inspired from CDK: https://docs.aws.amazon.com/cdk/v2/guide/hello_world.html#hello_world_tutorial_create_app
Re subcommand name: how about dagger gq? Name inspired from Hasura's graphql client (https://github.com/hasura/graphqurl). Also MongoDB uses this reference as well https://www.mongodb.com/docs/atlas/app-services/graphql/cli/

Should the primary graphql client feature be rooted at the top-level command? Or should it be moved into a sub-command for easier coexistence with other features? And if the latter: what should the sub-command be called? What would the other sub-commands be?

Only thing that comes to my mind right now is to give the ability to generate shell auto-completion based on the graphQL schema. Similar to what gql (https://github.com/graphql-editor/gql) does. Apart from that I don't why I couldn't be a top-level command.

Nov 08 '22 16:11 marcosnils