[v0.21] custom DNS nameservers require IP addresses
Contributing guidelines and issue reporting guide
- [x] I've read the contributing guidelines and wholeheartedly agree. I've also read the issue reporting guide.
Well-formed report checklist
- [x] I have found a bug that the documentation does not mention anything about my problem
- [x] I have found a bug that there are no open or closed issues that are related to my problem
- [x] I have provided version/information about my environment and done my best to provide a reproducer
Description of bug
Bug description
After upgrading from buildkitd 0.20.0 to 0.21.0, some of our buildkitd installations always fail to build:
$ buildctl ... \
--frontend=gateway.v0 \
--opt source=docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.23.0 \
--opt filename=.pipeline/blubber.yaml \
--opt target=test \
--local context=. \
--local dockerfile=.
2025-05-29 15:05:23,279 Using build frontend docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.23.0
#1 resolve image config for docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.23.0
#1 DONE 0.2s
#2 docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.23.0@sha256:6b1535a39497bb6c5e0a733595721a91cee33dba99ab59d8323d077665073a53
#2 resolve docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.23.0@sha256:6b1535a39497bb6c5e0a733595721a91cee33dba99ab59d8323d077665073a53 0.0s done
#2 CACHED
error: failed to solve: ParseAddr("ns-recursor.openstack.eqiad1.wikimediacloud.org"): unexpected character (at "ns-recursor.openstack.eqiad1.wikimediacloud.org")
The problem goes away when buildkitd is downgraded to 0.20.0.
We have two different clusters of buildkitd's. Both clusters were upgraded to 0.21.0 at the same time and one of the clusters builds fine, and the other has this problem.
Notes:
ParseAddris expecting an IP address string, not a domain name.- buildkitd.toml has:
[dns]
nameservers = ["ns-recursor.openstack.eqiad1.wikimediacloud.org"]
Example Job log: https://gitlab.wikimedia.org/dancy/deleteme/-/jobs/522422
Version information
This starting happening with buildkitd 0.21.0 and persists in 0.22.0. The version 0.20.0 does not exhibit this behavior.
If someone can provide advice on how to collect a full stack trace (from buildkitd, not buildctl) when this error occurs, I'll try it out.
I ended up finding the following in buildkitd's config file (not sure why I didn't look there first):
[dns]
nameservers = ["ns-recursor.openstack.eqiad1.wikimediacloud.org"]
Oops, I didn't mean to close this issue. Anyway, there is a change of behavior between the aforementioned buildkitd versions. I don't know if that's something yall want to fix. In the meantime I'll make sure we use an IP address string.
Thanks @thaJeztah
@thaJeztah Seems to be regression from https://github.com/moby/moby/commit/00bd916203d01831bea2173ead6cd6736b53a877#diff-a8ba0929fdc2848f37b485f438bcd8923d63d07db6979f108aabc611aa8707f4R7-R131
cc @robmry
A hostname can't be a nameserver, a nameserver address is needed before a name can be resolved into an address. So I guess DNS from build containers running on that host just didn't work before? (Or an IP address got picked up from somewhere else?)
Failing on misconfiguration seems better than silently ignoring it. But, perhaps it could be reported when validating the config file, so the error message can be clearer about where the problem is?
(Or, perhaps the hostname should be resolved by buildkit, to get an address to use in the container's resolv.conf?)
(Catching up) Ah! I see the issue now (didn't read in-depth when I reopened). Yes, looks indeed like before it would silently ignore the invalid configuration. So either buildkit is missing a validation step, or if it was intentional to allow a domain-name to be specified, I guess BuildKit should somehow resolve the domain before passing to to the code that writes the resolv.conf 🤔
(We should probably look if we can make the error message more informative though from the resolvconf code)
I would have expected it to resolve it first in the daemon scope, as this was reported as a regression. I don't see where it would happen in https://github.com/moby/buildkit/issues/6001#issuecomment-2923597403 though. If it never worker then of course no need to add a special feature for it.
@dancysoft can you confirm if this is a regression or invalid conf receiving an error?
Sorry for the late reply. I was off for a week.
I re-tested everything today with the following buildkitd.toml:
# Use CNI to isolate each build container network namespace
networkMode = "cni"
# Pre-allocate a pool of network namespaces
cniPoolSize = 20
[dns]
nameservers = ["one.one.one.one"]
Buildkitd is started like so:
docker network create deleteme
version=v0.20.0
docker run -d --name buildkitd --privileged \
-v ./buildkitd.toml:/etc/buildkit/buildkitd.toml:ro \
-p 1234:1234 \
--network deleteme \
moby/buildkit:$version \
--addr tcp://0.0.0.0:1234 \
--config \
/etc/buildkit/buildkitd.toml
This configuration works fine when version=v0.20.0. I can successfully build buildkitd with is using the following command:
(The current directory is a git clone of the buildkit repo)
docker run --rm -it \
--network deleteme \
--entrypoint buildctl \
-v .:/src:ro \
moby/buildkit:v0.20.0 \
--addr tcp://buildkitd:1234 build \
--frontend dockerfile.v0 \
--local context=/src \
--local dockerfile=src \
--progress=plain
If I change to version=v0.21.0 and restart the buildkitd container, the build fails:
dancy@base:~/src/wmf/buildkit$ ./test-build
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 18.32kB done
#1 DONE 0.1s
#2 resolve image config for docker-image://docker.io/docker/dockerfile-upstream:master
#2 DONE 1.3s
#3 docker-image://docker.io/docker/dockerfile-upstream:master@sha256:7a6acb5d355f1fdfa63b5930b6a03c1370ebd425c50d6c6c0861004fe4e247d6
#3 resolve docker.io/docker/dockerfile-upstream:master@sha256:7a6acb5d355f1fdfa63b5930b6a03c1370ebd425c50d6c6c0861004fe4e247d6 0.0s done
#3 sha256:93ad73e33b81ab605ab21198d4fe790d80d17e766ad941ecc527d61b9e22252d 0B / 14.08MB 0.2s
#3 sha256:93ad73e33b81ab605ab21198d4fe790d80d17e766ad941ecc527d61b9e22252d 6.29MB / 14.08MB 0.3s
#3 sha256:93ad73e33b81ab605ab21198d4fe790d80d17e766ad941ecc527d61b9e22252d 14.08MB / 14.08MB 0.4s done
#3 extracting sha256:93ad73e33b81ab605ab21198d4fe790d80d17e766ad941ecc527d61b9e22252d 0.1s done
#3 DONE 0.6s
Dockerfile:1
--------------------
1 | >>> # syntax=docker/dockerfile-upstream:master
2 |
3 | ARG RUNC_VERSION=v1.2.5
--------------------
error: failed to solve: ParseAddr("one.one.one.one"): unexpected character (at "one.one.one.one")
Thanks @dancysoft - the error is definitely new, and its message will be more helpful in the next release (https://github.com/moby/moby/pull/50124).
But, we think a non-IP-address nameserver would have been silently ignored in older releases (not treated as a hostname and resolved, or anything like that). So, the build container wouldn't have been using the expected nameserver.
Thanks @dancysoft - the error is definitely new, and its message will be more helpful in the next release (moby/moby#50124).
But, we think a non-IP-address nameserver would have been silently ignored in older releases (not treated as a hostname and resolved, or anything like that). So, the build container wouldn't have been using the expected nameserver.
I see. And I presume the following (previously overlooked) buildkitd log message is evidence of this?
time="2025-06-10T19:00:55Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
Ah, yes - exactly! Thank you.