Add detach timeout for service commands
Fixes #663
Allow --detach to accept a duration in addition to a boolean value. If the timeout is hit then it will detach and exit non-zero, but the operation will continue.
Codecov Report
Merging #665 into master will increase coverage by
0.49%. The diff coverage is60.81%.
@@ Coverage Diff @@
## master #665 +/- ##
==========================================
+ Coverage 53.46% 53.96% +0.49%
==========================================
Files 218 218
Lines 14642 14683 +41
==========================================
+ Hits 7829 7923 +94
+ Misses 6327 6265 -62
- Partials 486 495 +9
I'm very uncomfortable with the new semantics of --detach. It is boolean and requires an argument at the same time. AFAIK, we do not have this kind of option up to now.
With this change, --detach is no longer valid syntax. The user has to specify --detach=true instead. While the --opt=true syntax works for all boolean options, this is the first place where it is actually required.
I suggest not to overload --detach and instead add a dedicated timeout option, e.g. --detach-timeout=5s. The new option could also replace --detach when used with a value of --detach-timeout=0s.
I also fear that this kind of mixed-type options might get us in trouble at some point in time, e.g. when the option parsing lib has to be changed for some reason or the completions should be generated.
With this change, --detach is no longer valid syntax.
Hmm, that is a problem. I'll see about fixing that.
I suggest not to overload --detach and instead add a dedicated timeout option, e.g. --detach-timeout=5s
I don't like having two flags to control a single behaviour. It means that every place it's used needs to validate the two flags don't conflict, and it's a lot more to explain to a user.
I also fear that this kind of mixed-type options might get us in trouble at some point
It's really not any different from any other custom flag.Value type, which we use extensively, so this is not new.
Thanks for catching the regression with --detach ! I've fixed it and added a test for this case.
I'm still opposed to this syntax change.
We also have docker run --detach, where --detach is a boolean. So from the POV of consistence, it is a surprise that the same option here can take arguments.
--detach is a good name for a boolean option, but it is not obvious what --detach 12s should mean. If it's a timeout, the option should include that in its name.
We are coding two different concerns (and data types) into one string argument, which I think is evil. As an example, I will not be able to support this syntax in bash completion. Bash completion makes a hard distinction between boolean options (with no arguments) and options that take arguments. There is no concept of optional arguments for an option. Optional arguments would make it impossible to safely distinguish between optional option arguments and command arguments that follow such an option.
We also have docker run --detach, where --detach is a boolean. So from the POV of consistence, it is a surprise that the same option here can take arguments.
docker run --detach can also take arguments (true, false), this flag just happens to accept more arguments. It also behaves differently, so the arguments only make sense for this case.
but it is not obvious what --detach 12s should mean. If it's a timeout, the option should include that in its name.
If we were starting from scratch we could name this something else, but the flag already exists. Having two flags for the same behaviour is not an improvement. We could have it support --detach after=12s or --detach timeout=10s if you think that is an improvement.
I really don't think --detach 12s is confusing. It is easily explained in the help text.
We are coding two different concerns
The concern is the same, when the process should exit. Previously the behaviour only supported 2 cases so a boolean was appropriate. Now it supports 3, so we need to expand the support values.
Bash completion makes a hard distinction between boolean options (with no arguments) and options that take arguments. There is no concept of optional arguments for an option. Optional arguments would make it impossible to safely distinguish between optional option arguments and command arguments that follow such an option.
The -detach flag always accepted arguments, --detach=false was actually suggested to users in a warning message, so if bash completion never supported this we are not any worse off. Optional arguments are not the most common, but I have seen them used, and our flag parsing library has support for them, so it would be unfortunate not to make use of them when it's appropriate.
Why can't the bash completion just assume it always takes an argument? This seems fine to me. It doesn't need to support optional arguments.
I really don't think --detach 12s is confusing. It is easily explained in the help text.
A good name does not require you to look into the help text.
The -detach flag always accepted arguments
Bash completion for boolean options generally completes only the option that actually changes behaviour: --option or --option=false, depending on the default value. It never completes --option=true because that does not correspond with the way most users would type the the command (given the choice between --option and --option=true, most users would use the shorter form). This results in shorter command lines that are easier to comprehend.
Why can't the bash completion just assume it always takes an argument?
The fact that you are suggesting to change the completion UX is a clear sign that we have a breaking change here.
Besides, there is no good completion for a combined boolean|duration value because the durations are not a limited set. I could pre-select some values and complete true|false|1s|2s|5s|10s, but this would give the user the false impression that only these numeric values are allowed. In doing so, I would also loose the information which of the values is the default. The user would have to look it up in the help text, which would break the UX of Bash completion.
I'd like to get more opinions here, WDYT @thaJeztah @tianon?
Can we make this a global option that can be set in the CLI's config.json? Having a configuration for this would also;
- help people that want to keep the pre-17.10 "asynchronous" behaviour
- allow specifying a default timeout
I'm a bit in doubt if (as a user) I would be specifying --detach <some duration> when running these commands. I'd rather "set and forget" these options
(Just some initial thoughts - haven't considered consequences, and if this would cover all use-cases)
I agree that just by looking at the presence of a --detach flag, I would think "I'm running detached", not "I'm running attached, but will be detached after x seconds"
@albers I'd like to see if I can summarize your concerns.
- Backwards compatibility
You correctly pointed out that the first version of this had a bug which broke --detach. This was fixed, so the change is now 100% backwards compatible. Every previously supported value for --detach is still supported.
The fact that you are suggesting to change the completion UX is a clear sign that we have a breaking change here.
I don't see this as a breaking change. Every previously supported value is still supported, so there is no way a user can be broken. Even someone who is using an old version of the completion with a newer client will still be fine. Shell completion needs to be expanded to support more values, but that is not a breaking change.
- Incompatible with shell completion
I don't see how this could possibly be true, since we have plenty of flags that can accept any string value. This flag should be treated the same way, accept any value, don't assume anything about the value. It shouldn't be treated as a boolean flag anymore.
- UX
This is a valid concern, and we should definitely discuss this further. There are a few options:
- continue to use the
--detachflag and expand the supported values (this PR) - add a new
--detach-timeoutflag which can't be set unless--detachis also set - others?
I personally think having two flags for this is worse. It's more to type, and more to explain. There is a single behaviour "when to detach", and that can be expressed with a single flag.
I also don't think that --detach=12s is hard to understand. It reads "detach [after] 12s". We could be even more explicit here by accepting values like --detach=timeout=12s or --detach=after=12s. We aready have plenty of places (env, labels, etc) that accept values just like this. If this fixes the problem I can make that change.
Can we make this a global option
Maybe, but I think that is a separate issue. I don't think it is appropriate to configure this as only a global value. If I'm scripting something I want to wait x time before failing. The expected time for an update/scale could also vary quite a bit from one service to another. I might want to wait only a few seconds for something with a couple replicas, but wait much longer for something that has many. If it only varied by user then a global option might make sense. It needs to be a flag first, config options can come later (if they are necessary).
I agree that just by looking at the presence of a --detach flag, ...
Right, but the flag isn't just --detach now, it's --detach=value. --detach continues to do what you would expect it to do, and what it did previously (detach immediately).
I don't see this as a breaking change. Every previously supported value is still supported, so there is no way a user can be broken.
This is true as long as you treat the option values as strings and only consider code that produces command lines.
For a library that consumes shell code lines, it is not longer safe to assume that --detach can be represented as a boolean value. Imagine some kind of parser for Docker commands gathered from a log file. This lib would actually be broken when the first non-boolean value is seen.
I admit that this is a rather theoretical usecase and probably out of scope for compatibility considerations.
We could be even more explicit here by accepting values like --detach=timeout=12s or --detach=after=12s.
This is much better because I can construct a really helpful shell completion for it: true|false|after=.
The only drawback is that I can no longer express the default value, but I'm fine with this.
Sounds good! This now supports after=<duration>
There is a regression with --detach (no arg form). It defaults to false but should default to true.
The UX is inconsistent. The help message says:
$ docker service create --help | grep detach
-d, --detach bool|duration[=false] Exit after timeout, or immediately if set to true
But that syntax is not legal:
$ docker service create --detach after=10s busybox top
Error response from daemon: rpc error: code = InvalidArgument desc = ContainerSpec: "after=10s" is not a valid repository/tag
Only --detach=after=10s works.
This is unexpected and inconsistent with the usual semantics where --option value and --option=value are equivalent for all non-boolean options.
This is due to the special treatment of boolean values in Docker's argument parsing: --option=true is legal whereas --option true is not. There is a distinction between boolean and non-boolean options, and mixing both types is problematic.
I do not see how this can be resolved. We either have to drop support for the no-arg short form or for the two word syntax variant in order to keep parsing of the command line deterministic.
There is a regression with --detach (no arg form). It defaults to false but should default to true.
Oops, not sure what I was thinking. Fixed the flag, and the test to expect true.
Thanks for calling this out again.
Hi Any update on this ?
Thanx
rebased
Hi there
Is there any update on this ? Today some CI/CD deployments are based on service create/update and this can be useful for them
Thanx
Hi
Any update ??
Here's a workaround. A timeout binary is present in both alpine and ubuntu based images. If the update is timed out you get an error code in $? and stderr prints Terminated.
$ timeout 60 docker service update ...
What else is needed to help get this merged? Doing a service update and having it loop indefinitely after a failure doesn't seem like a good experience - instead, it should clearly exit to indicate a deployment failure.
Using timeout is certainly just a workaround for something that should be supported natively by Docker.
@vieux !! How are you?!
Yeah, I'll be cleaning up older PR's; I might want to have a look at some to see if there's still value and/or things to adopt.