toolbox
toolbox copied to clipboard
fix: use FQDN with `toolbox` prefix
Issue: https://github.com/containers/toolbox/issues/969
Setting the hostname to toolbox
causes timeouts whenever anything tries
to resolve the name of the machine - for example sudo
does this.
This change makes it so the FQDN is set to
${container_name}.${hostname}
as recommended in the linked issue.
After this change commands can properly resolve the local FQDN.
I removed the symlink to /run/host/etc/hosts
because podman already copies that information in and then we can use --add-host
to add a mapping to localhost for the container - this way calling ping $(hostname)
does what is expected.
Pending PRs:
- https://github.com/containers/toolbox/pull/1007 - this adds a new flag to allow setting the hostname of the toolbox, I think this should just follow the
container-name.hostname
convention otherwise it seems confusing - https://github.com/containers/toolbox/pull/771/files - similar but sets the hostname of the container to be equal to the container name - the name is still unresolvable however.
- https://github.com/containers/toolbox/pull/383/files - same as above but tries to sanitize the container name
- https://github.com/containers/toolbox/pull/573/files - Obsolete, PR against bash toolbox
None of these PRs address my issue with delays due to unresolvable hostnames. So this one tries to do that.
Sample Output
[akdev@canzuk toolbox]$ ./build/src/toolbox create -i docker.io/akdev1l/ubuntu-toolbox:22.04 test1
Created container: test1
Enter with: toolbox enter test1
[akdev@canzuk toolbox]$ ./build/src/toolbox enter test1
⬢[akdev@test1 toolbox]$ hostname
test1.canzuk.hq.akdev.xyz
⬢[akdev@test1 toolbox]$ cat /etc/hosts
127.0.0.1 test1.canzuk.hq.akdev.xyz
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 toolbox
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.1.100 host.containers.internal
Build succeeded.
:heavy_check_mark: unit-test SUCCESS in 6m 54s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 16m 52s :heavy_check_mark: system-test-fedora-36 SUCCESS in 9m 51s :heavy_check_mark: system-test-fedora-35 SUCCESS in 10m 05s
this seems to trigger a minor bug, when exiting the toolbox if I pressed ctrl+c
at the prompt then toolbox falsely prints out an empty Error:
message
I'll have to dig into that
Build succeeded.
:heavy_check_mark: unit-test SUCCESS in 6m 57s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 11m 04s :heavy_check_mark: system-test-fedora-36 SUCCESS in 9m 57s :heavy_check_mark: system-test-fedora-35 SUCCESS in 10m 10s
Build failed.
:x: unit-test NODE_FAILURE in 0s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 13m 16s :heavy_check_mark: system-test-fedora-36 SUCCESS in 13m 16s :heavy_check_mark: system-test-fedora-35 SUCCESS in 13m 51s
sorry for the noise the bug I hit seems to be because toolbox exits with return code 130 (SIGTERM) whenever ctrl+c
is pressed - seriously wondering how my changes triggered this
capsh --caps= -- -c exec "$@" /bin/sh /bin/bash -l
why is so convoluted ... it would seems this is equivalent
capsh --caps= -- -c 'exec /bin/bash -l'
mm from further experimentation it seems that bash just does this and toolbox
should just ignore the 130
error code
toolbox without modification does this:
[akdev@canzuk toolbox]$ toolbox enter test10
⬢[akdev@test10 toolbox]$ ^C
⬢[akdev@test10 toolbox]$ ^C
⬢[akdev@test10 toolbox]$ ^C
⬢[akdev@test10 toolbox]$
logout
[akdev@canzuk toolbox]$ echo $?
0
[akdev@canzuk toolbox]$ bash
[akdev@canzuk toolbox]$ ^C
[akdev@canzuk toolbox]$ ^C
[akdev@canzuk toolbox]$
exit
[akdev@canzuk toolbox]$ echo $?
130
imo this should keep the error code of the last command execute so it should be 130
at the end to match the behaviour of bash - we could probably simplify that capsh
shenanigans, looks like there's at least an unnecessary fork there
Build failed.
:x: unit-test FAILURE in 7m 15s :x: system-test-fedora-rawhide FAILURE in 16m 10s :x: system-test-fedora-36 FAILURE in 10m 29s :x: system-test-fedora-35 FAILURE in 10m 34s
Build failed.
:heavy_check_mark: unit-test SUCCESS in 7m 07s :x: system-test-fedora-rawhide FAILURE in 15m 36s :x: system-test-fedora-36 FAILURE in 10m 27s :x: system-test-fedora-35 FAILURE in 10m 41s
Build failed.
:heavy_check_mark: unit-test SUCCESS in 7m 03s :x: system-test-fedora-rawhide FAILURE in 15m 52s :x: system-test-fedora-36 FAILURE in 10m 13s :x: system-test-fedora-35 FAILURE in 10m 30s
Build failed.
:heavy_check_mark: unit-test SUCCESS in 7m 10s :x: system-test-fedora-rawhide FAILURE in 16m 00s :x: system-test-fedora-36 FAILURE in 10m 19s :x: system-test-fedora-35 FAILURE in 10m 54s
I have continued development of toolbox on my own as I'm not really having this merged on a reasonable time frame.
my fork is at: https://github.com/akdev1l/toolbox/tree/akdev
I have fixed some long standing issues and added some features (particularly I have enabled a static build and containerized toolbox
itself, resolved the DNS issues and added basic export
support) - feel free to ping me if there's any interest on that.
Otherwise I will leave this PR to die - feel free to close.
sorry for the noise the bug I hit seems to be because toolbox exits with return code 130 (SIGTERM) whenever
ctrl+c
is pressed - seriously wondering how my changes triggered this
I don't think your changes introduced that. :)
A few months ago, Toolbx started propagating the exit code of the command, which triggered this. You wouldn't have encountered this behaviour before that.
From the bash(1)
manual:
When a command terminates on a fatal signal N,
bash uses the value of 128+N as the exit status.
Ctrl+c
is SIGINT
(not SIGTERM
) and it's numerical value is 2:
$ kill -L
1) SIGHUP 2) SIGINT ...
Hence, 130 (= 128 + 2) as the exit code.
Setting the hostname to toolbox causes timeouts whenever anything tries to resolve the name of the machine - for example sudo does this.
I am curious. I have never experienced delays when using sudo(8)
inside a Toolbx container, but maybe you are doing something that I never do. Could you please describe this in a bit more detail?
I have enabled a static build
The way things stand today, we are unlikely to build statically or disable CGO. See: https://github.com/containers/toolbox/issues/832
Of course, this may change if our realities change or if new information comes to light.
@debarshiray hi! thanks for your comments - I'll clean this up and just deal with the fqdn change
you gave me some historical background on toolbx and I appreciate that.
I am curious. I have never experienced delays when using sudo(8) inside a Toolbx container, but maybe you are doing something that I never do. Could you please describe this in a bit more detail?
this happened to me because I build/distribute my own toolbx images (https://github.com/akdev1l/toolbox-images) - there is a requirement that isn't specified in the documentation which is having nss-myhostname
installed and enabled in /etc/nsswitch.conf
in the toolbox image. Without that sudo
tries to resolve the container hostname and fails. (using the fqdn as originally intended in this PR solves this issue too as the name is resolvable, hence that was original drive for this change)
this happened to me because I build/distribute my own toolbx images (https://github.com/akdev1l/toolbox-images) - there is a requirement that isn't specified in the documentation which is having
nss-myhostname
installed and enabled in/etc/nsswitch.conf
in the toolbox image. Without thatsudo
tries to resolve the container hostname and fails. (using the fqdn as originally intended in this PR solves this issue too as the name is resolvable, hence that was original drive for this change)
nod
@HarryMichal explained that to me later in real life. My apologies, I forgot to mention that here.
@debarshiray minimal cleaned this up, this should be the minimal change required for a proper FQDN - looks like it passes the tests
I still think we should change the prefix from "toolbox" to the name of the toolbox, bash prompts seem to show the first component of the FQDN (with this change there won't be any user visible prompt changes as we have hardcoded it the first component to toolbox
for now)
[akdev@toronto toolbox]$ echo $PS1
[\u@\h \W]\$
so this would solve the issue of distinguishing the containers very elegantly imo (no changes required on the images, no big changes required on toolbox, no custom scripts nor custom prompts for bash or other shells)
Build succeeded.
:heavy_check_mark: unit-test SUCCESS in 8m 14s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 26s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 32m 01s :heavy_check_mark: system-test-fedora-36 SUCCESS in 11m 08s :heavy_check_mark: system-test-fedora-35 SUCCESS in 12m 08s
Build succeeded.
:heavy_check_mark: unit-test SUCCESS in 8m 14s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 14s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 31m 57s :heavy_check_mark: system-test-fedora-36 SUCCESS in 11m 00s :heavy_check_mark: system-test-fedora-35 SUCCESS in 11m 52s
https://github.com/containers/toolbox/issues/98#issuecomment-1422038509
@debarshiray Does this PR need anything else before merging? This PR fixes a quite annoying problem #1059 when using GUI apps inside toolbox when running KDE on the host. Having the host's host name as the domain for the toolbox container's host name fixes that issue.
Hey @akdev1l, sorry for the delay on this PR. I am reviewing everything to merge it as soon as possible.
I just wanted to ask a quick question. At the beginnning of the issue, in the initial explanation, you say that the expected container hostname is the following ${container_name}.${hostname}
. However, after reviewing the code I realised that it is actually toolbox.${hostname}
, am I right?
At the beginnning of the issue, in the initial explanation, you say that the expected container hostname is the following
${container_name}.${hostname}
. However, after reviewing the code I realised that it is actuallytoolbox.${hostname}
, am I right?
As someone who uses multiple toolboxes (one for regular development work, and a "grab-bag" one for other stuff), I would prefer the container name. (But it's not a big deal, honestly, since I can always change the prompt from within the toolbox itself.)
I also wanted to reproduce those timeouts you were talking about. Could you please let me know which commands led you to those timeouts so that I can reproduce them on my machine?
I also wanted to reproduce those timeouts you were talking about. Could you please let me know which commands led you to those timeouts so that I can reproduce them on my machine?
Personally I encountered them mostly when using Jetbrains IDEs, e.g. IntelliJ, CLion, from within toolbox on a Fedora Kinoite (KDE plasma) system, but it should be possible to reproduce with other GUI apps on a KDE system because the problems seem to be with kwin trying to do some DNS resolution in relation to putting the host name in the title bar of the windows ("xterm <@toolbox>
". Sorry I don't have any more exact recipe for this though.
+1 for using the container name for the hostname. Distrobox already does this I believe. With multiple toolboxes it makes much more sense.
I just realized something: With the Foot terminal (and others), I can emit OSC 7 on directory change to have new terminals spawn in that directory. However, if you look at the linked code snippet:
printf \e\]7\;file://%s%s\e\\ $hostname (string escape --style=url $PWD)
You can see that this depends on the correct hostname (so that it doesn't erroneously catch directory changes over SSH connections, for example).
So, the toolbox not having the host's hostname breaks at least that bit of functionality.