toolbox icon indicating copy to clipboard operation
toolbox copied to clipboard

fix: use FQDN with `toolbox` prefix

Open akdev1l opened this issue 2 years ago • 27 comments

Issue: https://github.com/containers/toolbox/issues/969

Setting the hostname to toolbox causes timeouts whenever anything tries to resolve the name of the machine - for example sudo does this.

This change makes it so the FQDN is set to ${container_name}.${hostname} as recommended in the linked issue.

After this change commands can properly resolve the local FQDN.

I removed the symlink to /run/host/etc/hosts because podman already copies that information in and then we can use --add-host to add a mapping to localhost for the container - this way calling ping $(hostname) does what is expected.

Pending PRs:

  1. https://github.com/containers/toolbox/pull/1007 - this adds a new flag to allow setting the hostname of the toolbox, I think this should just follow the container-name.hostname convention otherwise it seems confusing
  2. https://github.com/containers/toolbox/pull/771/files - similar but sets the hostname of the container to be equal to the container name - the name is still unresolvable however.
  3. https://github.com/containers/toolbox/pull/383/files - same as above but tries to sanitize the container name
  4. https://github.com/containers/toolbox/pull/573/files - Obsolete, PR against bash toolbox

None of these PRs address my issue with delays due to unresolvable hostnames. So this one tries to do that.

Sample Output

[akdev@canzuk toolbox]$ ./build/src/toolbox create -i docker.io/akdev1l/ubuntu-toolbox:22.04 test1
Created container: test1
Enter with: toolbox enter test1
[akdev@canzuk toolbox]$ ./build/src/toolbox enter test1
⬢[akdev@test1 toolbox]$ hostname
test1.canzuk.hq.akdev.xyz
⬢[akdev@test1 toolbox]$ cat /etc/hosts
127.0.0.1	test1.canzuk.hq.akdev.xyz
127.0.0.1	localhost localhost.localdomain localhost4 localhost4.localdomain4 toolbox
::1	localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.1.100	host.containers.internal

akdev1l avatar Aug 17 '22 01:08 akdev1l

Build succeeded.

:heavy_check_mark: unit-test SUCCESS in 6m 54s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 16m 52s :heavy_check_mark: system-test-fedora-36 SUCCESS in 9m 51s :heavy_check_mark: system-test-fedora-35 SUCCESS in 10m 05s

this seems to trigger a minor bug, when exiting the toolbox if I pressed ctrl+c at the prompt then toolbox falsely prints out an empty Error: message

I'll have to dig into that

akdev1l avatar Aug 17 '22 03:08 akdev1l

Build succeeded.

:heavy_check_mark: unit-test SUCCESS in 6m 57s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 11m 04s :heavy_check_mark: system-test-fedora-36 SUCCESS in 9m 57s :heavy_check_mark: system-test-fedora-35 SUCCESS in 10m 10s

Build failed.

:x: unit-test NODE_FAILURE in 0s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 13m 16s :heavy_check_mark: system-test-fedora-36 SUCCESS in 13m 16s :heavy_check_mark: system-test-fedora-35 SUCCESS in 13m 51s

sorry for the noise the bug I hit seems to be because toolbox exits with return code 130 (SIGTERM) whenever ctrl+c is pressed - seriously wondering how my changes triggered this

akdev1l avatar Aug 19 '22 00:08 akdev1l

capsh --caps= -- -c exec "$@" /bin/sh /bin/bash -l

why is so convoluted ... it would seems this is equivalent

capsh --caps= -- -c 'exec /bin/bash -l'

mm from further experimentation it seems that bash just does this and toolbox should just ignore the 130 error code

toolbox without modification does this:

[akdev@canzuk toolbox]$ toolbox enter test10
⬢[akdev@test10 toolbox]$ ^C
⬢[akdev@test10 toolbox]$ ^C
⬢[akdev@test10 toolbox]$ ^C
⬢[akdev@test10 toolbox]$
logout
[akdev@canzuk toolbox]$ echo $?
0
[akdev@canzuk toolbox]$ bash
[akdev@canzuk toolbox]$ ^C
[akdev@canzuk toolbox]$ ^C
[akdev@canzuk toolbox]$
exit
[akdev@canzuk toolbox]$ echo $?
130

imo this should keep the error code of the last command execute so it should be 130 at the end to match the behaviour of bash - we could probably simplify that capsh shenanigans, looks like there's at least an unnecessary fork there

akdev1l avatar Aug 19 '22 02:08 akdev1l

Build failed.

:x: unit-test FAILURE in 7m 15s :x: system-test-fedora-rawhide FAILURE in 16m 10s :x: system-test-fedora-36 FAILURE in 10m 29s :x: system-test-fedora-35 FAILURE in 10m 34s

Build failed.

:heavy_check_mark: unit-test SUCCESS in 7m 07s :x: system-test-fedora-rawhide FAILURE in 15m 36s :x: system-test-fedora-36 FAILURE in 10m 27s :x: system-test-fedora-35 FAILURE in 10m 41s

Build failed.

:heavy_check_mark: unit-test SUCCESS in 7m 03s :x: system-test-fedora-rawhide FAILURE in 15m 52s :x: system-test-fedora-36 FAILURE in 10m 13s :x: system-test-fedora-35 FAILURE in 10m 30s

Build failed.

:heavy_check_mark: unit-test SUCCESS in 7m 10s :x: system-test-fedora-rawhide FAILURE in 16m 00s :x: system-test-fedora-36 FAILURE in 10m 19s :x: system-test-fedora-35 FAILURE in 10m 54s

I have continued development of toolbox on my own as I'm not really having this merged on a reasonable time frame.

my fork is at: https://github.com/akdev1l/toolbox/tree/akdev

I have fixed some long standing issues and added some features (particularly I have enabled a static build and containerized toolbox itself, resolved the DNS issues and added basic export support) - feel free to ping me if there's any interest on that.

Otherwise I will leave this PR to die - feel free to close.

akdev1l avatar Aug 25 '22 01:08 akdev1l

sorry for the noise the bug I hit seems to be because toolbox exits with return code 130 (SIGTERM) whenever ctrl+c is pressed - seriously wondering how my changes triggered this

I don't think your changes introduced that. :)

A few months ago, Toolbx started propagating the exit code of the command, which triggered this. You wouldn't have encountered this behaviour before that.

From the bash(1) manual:

When a command terminates on a fatal signal N,
bash uses the value of 128+N as the exit status.

Ctrl+c is SIGINT (not SIGTERM) and it's numerical value is 2:

$ kill -L
 1) SIGHUP	 2) SIGINT ...

Hence, 130 (= 128 + 2) as the exit code.

debarshiray avatar Nov 17 '22 21:11 debarshiray

Setting the hostname to toolbox causes timeouts whenever anything tries to resolve the name of the machine - for example sudo does this.

I am curious. I have never experienced delays when using sudo(8) inside a Toolbx container, but maybe you are doing something that I never do. Could you please describe this in a bit more detail?

debarshiray avatar Nov 17 '22 21:11 debarshiray

I have enabled a static build

The way things stand today, we are unlikely to build statically or disable CGO. See: https://github.com/containers/toolbox/issues/832

Of course, this may change if our realities change or if new information comes to light.

debarshiray avatar Nov 18 '22 16:11 debarshiray

@debarshiray hi! thanks for your comments - I'll clean this up and just deal with the fqdn change

you gave me some historical background on toolbx and I appreciate that.

I am curious. I have never experienced delays when using sudo(8) inside a Toolbx container, but maybe you are doing something that I never do. Could you please describe this in a bit more detail?

this happened to me because I build/distribute my own toolbx images (https://github.com/akdev1l/toolbox-images) - there is a requirement that isn't specified in the documentation which is having nss-myhostname installed and enabled in /etc/nsswitch.conf in the toolbox image. Without that sudo tries to resolve the container hostname and fails. (using the fqdn as originally intended in this PR solves this issue too as the name is resolvable, hence that was original drive for this change)

akdev1l avatar Nov 26 '22 18:11 akdev1l

this happened to me because I build/distribute my own toolbx images (https://github.com/akdev1l/toolbox-images) - there is a requirement that isn't specified in the documentation which is having nss-myhostname installed and enabled in /etc/nsswitch.conf in the toolbox image. Without that sudo tries to resolve the container hostname and fails. (using the fqdn as originally intended in this PR solves this issue too as the name is resolvable, hence that was original drive for this change)

nod

@HarryMichal explained that to me later in real life. My apologies, I forgot to mention that here.

debarshiray avatar Nov 29 '22 01:11 debarshiray

@debarshiray minimal cleaned this up, this should be the minimal change required for a proper FQDN - looks like it passes the tests

I still think we should change the prefix from "toolbox" to the name of the toolbox, bash prompts seem to show the first component of the FQDN (with this change there won't be any user visible prompt changes as we have hardcoded it the first component to toolbox for now)

[akdev@toronto toolbox]$ echo $PS1
[\u@\h \W]\$

so this would solve the issue of distinguishing the containers very elegantly imo (no changes required on the images, no big changes required on toolbox, no custom scripts nor custom prompts for bash or other shells)

akdev1l avatar Dec 05 '22 22:12 akdev1l

Build succeeded.

:heavy_check_mark: unit-test SUCCESS in 8m 14s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 26s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 32m 01s :heavy_check_mark: system-test-fedora-36 SUCCESS in 11m 08s :heavy_check_mark: system-test-fedora-35 SUCCESS in 12m 08s

Build succeeded.

:heavy_check_mark: unit-test SUCCESS in 8m 14s :heavy_check_mark: unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 14s :heavy_check_mark: system-test-fedora-rawhide SUCCESS in 31m 57s :heavy_check_mark: system-test-fedora-36 SUCCESS in 11m 00s :heavy_check_mark: system-test-fedora-35 SUCCESS in 11m 52s

https://github.com/containers/toolbox/issues/98#issuecomment-1422038509

jkemp814 avatar Feb 08 '23 05:02 jkemp814

@debarshiray Does this PR need anything else before merging? This PR fixes a quite annoying problem #1059 when using GUI apps inside toolbox when running KDE on the host. Having the host's host name as the domain for the toolbox container's host name fixes that issue.

jnohlgard avatar Apr 13 '23 07:04 jnohlgard

Hey @akdev1l, sorry for the delay on this PR. I am reviewing everything to merge it as soon as possible.

I just wanted to ask a quick question. At the beginnning of the issue, in the initial explanation, you say that the expected container hostname is the following ${container_name}.${hostname}. However, after reviewing the code I realised that it is actually toolbox.${hostname}, am I right?

nievesmontero avatar Jul 05 '23 10:07 nievesmontero

At the beginnning of the issue, in the initial explanation, you say that the expected container hostname is the following ${container_name}.${hostname}. However, after reviewing the code I realised that it is actually toolbox.${hostname}, am I right?

As someone who uses multiple toolboxes (one for regular development work, and a "grab-bag" one for other stuff), I would prefer the container name. (But it's not a big deal, honestly, since I can always change the prompt from within the toolbox itself.)

runiq avatar Jul 05 '23 10:07 runiq

I also wanted to reproduce those timeouts you were talking about. Could you please let me know which commands led you to those timeouts so that I can reproduce them on my machine?

nievesmontero avatar Jul 05 '23 10:07 nievesmontero

I also wanted to reproduce those timeouts you were talking about. Could you please let me know which commands led you to those timeouts so that I can reproduce them on my machine?

Personally I encountered them mostly when using Jetbrains IDEs, e.g. IntelliJ, CLion, from within toolbox on a Fedora Kinoite (KDE plasma) system, but it should be possible to reproduce with other GUI apps on a KDE system because the problems seem to be with kwin trying to do some DNS resolution in relation to putting the host name in the title bar of the windows ("xterm <@toolbox>". Sorry I don't have any more exact recipe for this though.

jnohlgard avatar Jul 10 '23 07:07 jnohlgard

+1 for using the container name for the hostname. Distrobox already does this I believe. With multiple toolboxes it makes much more sense.

juhp avatar Aug 25 '23 08:08 juhp

I just realized something: With the Foot terminal (and others), I can emit OSC 7 on directory change to have new terminals spawn in that directory. However, if you look at the linked code snippet:

printf \e\]7\;file://%s%s\e\\ $hostname (string escape --style=url $PWD)

You can see that this depends on the correct hostname (so that it doesn't erroneously catch directory changes over SSH connections, for example).

So, the toolbox not having the host's hostname breaks at least that bit of functionality.

runiq avatar Oct 05 '23 18:10 runiq