dagger 💥 API inconsistency with `AsService` using entrypoint

Summary

I propose that asService always executes the entrypoint and default arguments, instead of only as a fallback.

Background

When we were close to releasing v0.12.0, we rushed to revert the "Default exec" because it became very easy to trigger the execution of the entrypoint undesirably, just by ending a dagger call chain in a Container object:

https://github.com/dagger/dagger/issues/7821

I’s ok if the entrypoint is a CLI like git, but not if it’s a long running command. So now you have to be explicit, in case you want the stdout of a container and there’s no withExec defined, but expect to run the image’s CMD:

dag.Container().
    Build(src).
    WithExec([]string{}).
    Stdout(ctx)  // executes CMD

And if ENTRYPOINT is also required:

dag.Container().
    Build(src).
    WithExec([]string{}, dagger.ContainerWithExecOpts{UseEntrypoint: true}).
    Stdout(ctx)  // executes ENTRYPOINT + CMD

That’s necessary because Stdout applies to the last WithExec, and a withExec with empty arguments uses the default arguments (aka, CMD).

However, if CMD is a long running process, would you want to use Stdout for it? No, that’s what services as for.

Problem

It made sense, while removing the fallback to the “default command” from stdout, to leave asService alone, because services are long running commands, and that’s more likely to be in the CMD of the container image. That includes using the entrypoint in that case too, because a lot of docker images have entrypoint scripts that do a lot of setup before a long running command starts, like databases.

However, this creates a weird inconsistency. It’s not even that asService uses the entrypoint while stdout and stderr no longer do, per se. We can reason with that (long running vs not). It’s that it only happens if there’s no withExec:

dag.Container().
    From("registry:2").
    AsService()  // executes ENTRYPOINT + CMD

But if you add a withExec, then that’s what AsService() will execute, and without the entrypoint:

dag.Container().
    From("registry:2").
    WithFile("/setup", setupFile).
    WithExec([]string{"/setup"}).
    AsService()  // executes /setup, without the entrypoint, and exits early

So if you actually need to execute the entrypoint and default args, you need to do this:

dag.Container().
    From("registry:2").
    WithFile("/setup", setupFile).
    WithExec([]string{"/setup"}).
    WithExec([]string{}, dagger.ContainerWithExecOpts{UseEntrypoint: true}).
    AsService()  // executes ENTRYPOINT + CMD

Solution

I propose that services always execute the entrypoint and default arguments, instead of the last withExec. So the empty exec would no longer be necessary:

dag.Container().
    From("registry:2").
    WithFile("/setup", setupFile)
    WithExec([]string{"/setup"}).
    AsService()  // executes ENTRYPOINT + CMD

But if your intent is for change the command, then just change the entrypoint and/or default arguments:

dag.Container().
    From("registry:2").
    WithFile("/setup", setupFile)
    WithExec([]string{"/setup"}).
    WithEntrypoint([]string{"/my-entrypoint.sh"}).
    WithDefaultArgs([]string{"/init", "stage"}).
    AsService()  // executes new ENTRYPOINT + CMD

Rationale

My first impulse was to follow this idea in https://github.com/dagger/dagger/pull/7136#issuecomment-2135708579:

Dagger considers entrypoint to be "dumb" metadata on a container image. You can read it, and change it. But Dagger does not use it to configure its own execution of containers, because it doesn't need it: the same thing can be achieved with code.

And remove the fallback from asService as well, in order to resolve the inconsistency. However, it does seem to me that for a service, you’ll likely want to run the ENTRYPOINT + CMD in an image. Otherwise, it’s going to be more verbose in the common case.

I actually felt that when trying to remove the use of entrypoints in our codebase, and there were a few valid uses, like the registry in our CI (long running). But not, when the same module was setting the codegen CLI as an entrypoint to be used in withExecs in a far away place (in SDKs), and having a hard time pin pointing what said withExecs were doing.

It’s simple to explain because these are different types:

Type	Long running	Use last `withExec`	Run default command
`Container`	No	Yes	No, not even as a fallback
`Service`	Yes	No	Yes, always

Aug 10 '24 02:08 helderco

I skimmed this and the proposal makes sense to me. I am in favour of introducing this change before we release v0.13.0.

Unless anyone comments & disagrees, I think that we should go ahead with this. I would give it until Wednesday this week, and if there is no push-back, I'm for making the behaviour consistent.

WDYT @shykes @samalba @aluzzardi @jedevc @vito ?

Sep 02 '24 12:09 gerhard

Having inconsistent behavior of entrypoints in AsService vs WithExec feels a little sad, but I think it's definitely better than the current behavior, and it's definitely easier to explain over "you need a magic empty withExec".

Not as sure about having a Run method (at least in the form described). What if you want to get the Stderr of the entrypoint? Or the return code? That said, that's not the main thing proposed here, could that change be discussed/made separately?

In terms of scheduling, this feels feasible to do before next week if someone has time to pick it up - the tricky bit ofc will likely be the version compat again.

Sep 02 '24 13:09 jedevc

Yeah, I want to split the run idea into a separate issue but haven't been able to yet. It remains here just to tie into a future possibility (convenience), but the idea is that this resulting output would be combined (stdout + stderr), depending on adding such a field to Container too since it's been in the backlog for a long time.

Sep 02 '24 14:09 helderco

Here's my proposal:

extend type Container {
   "Start a long-running service from this container"
   start(
     "Command-line arguments to execute"
     args: [String!]!
     ): Service!
}

No ambiguity, no reliance on either entrypoint or previous withExec. It's what we should have done from the start.

Sep 03 '24 17:09 shykes

Unless anyone comments & disagrees, I think that we should go ahead with this. I would give it until Wednesday this week, and if there is no push-back, I'm for making the behaviour consistent.

If we get strong consensus by then, great, but we shouldn't rush it. After all this issue blames rushing a decision for 0.12 as the root cause of the problem...

Sep 03 '24 17:09 shykes

Here's my proposal:
extend type Container {
   "Start a long-running service from this container"
   start(
     "Command-line arguments to execute"
     args: [String!]!
     ): Service!
}
No ambiguity, no reliance on either entrypoint or previous withExec. It's what we should have done from the start.

I think the problem is that for a lot of images that people want to run as a service with (registry, databases, web server, etc...), they've already been built in the larger docker ecosystem with a ENTRYPOINT (for setting up the container) and CMD (for the daemon), so users would have to pull those values from the Container to combine into start(args).

In my mind this is very common and makes sense to have asService always use those instead of last withExec. But I think we need to at least remove the fallback to the default command because that's the major inconsistency here (because it depends on whether there's any withExec steps or not).

In that case, users that want to execute the default command from the image would need to add the same empty withExec we started requiring in v0.12.0 for stdout and stderr:

svc, err := ctr.WithExec([]string{}, dagger.ContainerWithExecOpts{UseEntrypoint: true}).AsService(ctx)

Sep 04 '24 18:09 helderco

We discussed this live with @helderco @sipsma @vito @jedevc.

We have consensus on a first part of the design, and almost consensus on a second part.

Part 1

This first part is ready to be implemented asap:

Principle: embrace "default args" (aka CMD) as the correct way to persist a default command in a container
Principle: tolerate "entrypoint" as an unwanted feature that you sometimes have to be compatible with via an optional flag.
When creating a service, it should be possible to specify the command args directly, or omit them and get a reasonable default
Per the first principle: a reasonable default is simply to use the container's default args. Not entrypoint, and not the last withExec.
Separately, per the second principle, an optional flag will allow prepending the args with the container's entrypoint, if present. This is independent of how the args were constructed
The result is the following API:

extend type Container {
   "Start a long-running service from this container"
   AsService(
     "Command-line arguments to execute. If not set, use the container's default args"
     args: [String!],
     "If the container has an entrypoint, prepend it to the args"
     useEntrypoint: Bool,
     ): Service!

Part 2

This second part should happen soon, but is not as urgent. We don't have full consensus yet, but it seemed pretty close.

New function: Container.start. Same as asService except the arguments are required
Container.asService becomes sugar on top of start. But sugar is bad for you, so we deprecate it
We remove Service.start and Service.stop to avoid confusion of having 2 different start
To compensate for the loss of the escape hatch Service.start, we design a solution for each known use case:
Replacement for the first use case of service.Start: we add a flag to Container.start to control the "grace period", ie: when you want to prevent the service being stopped/started in the same session. FIXME: work out the specifics
Replacement for the second use case of service.Start: we create a new Network type which contains an array of services. All the services in the network can be brought up in a coordinated way, and their endpoints exposed as a whole (for the docker-compose up use case).
If we still don't feel comfortable completely removing the escape hatch, maybe we rename Service.start and Service.stop to Service.manualStart and Service.manualStop?

Copying @kpenfound to include in the discussion of this second part

Oct 31 '24 20:10 shykes

I am working on the first part, and I wanted to ask a clarification in terms of DX for the users.

When creating a service, it should be possible to specify the command args directly, or omit them and get a reasonable default

Does that mean we can make args []string optional to AsService api (by adding a json tag default:"[]")?

If we make it optional, user can continue to say container.AsService(), and add args when needed container.AsService(dagger.ContainerAsServiceOpts{Args: []string{"run", "/main"}).

but if we make it mandatory, user will have to say container.AsService([]string{}) to get the default behavior.

Nov 06 '24 05:11 rajatjindal

Yes the args should be optional (in graphql syntax: args: [String!]

Nov 06 '24 05:11 shykes

Thanks @shykes. As I am making more changes, I came across another scenario. e.g. today we start dev engine as follows:

ctr := ctr.
		WithMountedCache("/var/lib/dagger", c.CacheVolume("dagger-dev-engine-state-"+identity.NewID())).
		WithExposedPort(1234, dagger.ContainerWithExposedPortOpts{Protocol: dagger.NetworkProtocolTcp}).
		WithExec([]string{
			"--addr", "tcp://0.0.0.0:1234",
			"--addr", "unix:///var/run/buildkit/buildkitd.sock",
			// avoid network conflicts with other tests
			"--network-name", deviceName,
			"--network-cidr", cidr,
		}, dagger.ContainerWithExecOpts{
			UseEntrypoint:            true,
			InsecureRootCapabilities: true,
		})

## and then in a separate function call

ctr.AsService()

I am suspecting as we make the change, we may have to adjust the api to account for this and add the additional flags such as InsecureRootCapabilities to AsService api. so it would look like following after the change:

ctr := ctr.
		WithMountedCache("/var/lib/dagger", c.CacheVolume("dagger-dev-engine-state-"+identity.NewID())).
		WithExposedPort(1234, dagger.ContainerWithExposedPortOpts{Protocol: dagger.NetworkProtocolTcp}).
				WithDefaultArgs([]string{
			"--addr", "tcp://0.0.0.0:1234",
			"--addr", "unix:///var/run/buildkit/buildkitd.sock",
			// avoid network conflicts with other tests
			"--network-name", deviceName,
			"--network-cidr", cidr,
		})

	// keeping WithExec here to allow for testcases that do not need
	// a service to continue to work like they were earlier.
	return ctr.WithExec([]string{}, dagger.ContainerWithExecOpts{
		UseEntrypoint:            true,
		InsecureRootCapabilities: true,
	})

## and then in a separate function call

ctr.AsService(dagger.ContainerAsServiceOpts{
	UseEntrypoint:            true,
	InsecureRootCapabilities: true,
})

similarly we may need to add support for other options as well which are supported in WithExec. what do you think?

Nov 06 '24 08:11 rajatjindal

@rajatjindal that is a great point. Yes, I guess we will InsecureRootCapabilities, ExperimentalPrivilegedNesting and maybe other arguments to withExec? Similar situation to Terminal.

I'm starting to feel like we have an API mess to cleanup... Lots of duplication between "execute a command and return"; "execute an interactive shell and return"; "execute a long-running service"; Maybe we should talk about resolving that @dagger/maintainers ?

Nov 06 '24 16:11 shykes

@rajatjindal one part of your example snippet I don't understand:

	// keeping WithExec here to allow for testcases that do not need
	// a service to continue to work like they were earlier.
	return ctr.WithExec([]string{}, dagger.ContainerWithExecOpts{
		UseEntrypoint:            true,
		InsecureRootCapabilities: true,
	})

Can you explain this part? In my mind this would be removed, to simplify the caller code.

Nov 06 '24 16:11 shykes

@rajatjindal one part of your example snippet I don't understand:
	// keeping WithExec here to allow for testcases that do not need
	// a service to continue to work like they were earlier.
	return ctr.WithExec([]string{}, dagger.ContainerWithExecOpts{
		UseEntrypoint:            true,
		InsecureRootCapabilities: true,
	})
Can you explain this part? In my mind this would be removed, to simplify the caller code.

I was looking at this testcase (and similar testcases in engine_test): https://github.com/dagger/dagger/blob/3956d894f84131467889e6377e1ec3f68fc0256c/core/integration/engine_test.go#L99

In this example we start the engine entrypoint script (without using AsService), wait for a few seconds and send SIGTERM. I've not really tried this with new AsService api yet but its one of the todo's for me.

Nov 06 '24 17:11 rajatjindal