The kubeconfig environment variable is sometimes not correctly referenced
Contributing guidelines
- [X] I've read the contributing guidelines and wholeheartedly agree
I've found a bug and checked that ...
- [X] ... the documentation does not mention anything about my problem
- [X] ... there are no open or closed issues that are related to my problem
Description
I need to create architecture-specific builders in two separate clusters, and this is how I do it
KUBECONFIG=/k8s-config-x86 buildx create --name builder-965284f6-b605-4a5c-87a6-d962909a43bc --node amd64-1783674487699423233-9652 '--platform=linux/amd64' --driver kubernetes --driver-opt 'namespace=buildx-builder' --use
KUBECONFIG=/k8s-config-arm buildx create --append --name builder-965284f6-b605-4a5c-87a6-d962909a43bc --node arm64-1783674487699423233-9652 '--platform=linux/arm64' --driver kubernetes --driver-opt 'namespace=buildx-builder' --use
This approach usually works and meets expectations, but sometimes an ARM pod gets launched in an x86 cluster, suggesting that the specified KUBECONFIG might not have been used. I'm not sure whether this is a bug or an issue with how I'm using it.
Expected behaviour
Run buildkit pod on clusters with different architectures.
Actual behaviour
Sometimes it occurs that an ARM buildkit pod is run on an x86 cluster.
Buildx version
github.com/docker/buildx v0.7.0 f0026081a7496ca28b597a9006616201d838fea8
Docker info
no
Builders list
none
Configuration
none
Build logs
No response
Additional info
No response
Commands look correct to me. cc @AkihiroSuda
x86 cluster
@tonistiigi I suspect that because the pod is not yet ready, executing buildx build will boot the pod, but there's a chance that the pod launched at this point will end up on the wrong cluster. Could there be an issue with referencing the kubeconfig file in this context? Before creating the builder, I will use the 'kubectl' command to create the corresponding BuildKit deployment, as follows:
KUBECONFIG=/k8s-config-x86 kubectl apply -f amd64-1.yaml -n buildx-builder --wait --timeout=600s
KUBECONFIG=/k8s-config-x86 buildx create --name builder-e72e8a7a-9e24-4b50-873f-fc305b7e62cb --node amd64-1788042013837361154-e72e --platform=linux/amd64 --driver kubernetes --driver-opt namespace=buildx-builder --use
KUBECONFIG=/k8s-config-arm kubectl apply -f arm64-1.yaml -n buildx-builder --wait --timeout=600s
KUBECONFIG=/k8s-config-arm buildx create --append --name builder-e72e8a7a-9e24-4b50-873f-fc305b7e62cb --node arm64-1788042013837361154-e72e --platform=linux/arm64 --driver kubernetes --driver-opt namespace=buildx-builder --use
buildx build --builder=builder-e72e8a7a-9e24-4b50-873f-fc305b7e62cb \
--platform=linux/amd64,linux/arm64 \
-t test.image.cn/test:1.0 \
-f ./Dockerfile .
"I have reproduced the issue and attempted to resolve it using this approach. Currently, I have verified that the problem has been fixed." @tonistiigi @crazy-max @AkihiroSuda
func ConfigFromEndpoint(endpointName string, s store.Reader) (clientcmd.ClientConfig, error) {
if strings.HasPrefix(endpointName, "kubernetes://") {
u, _ := url.Parse(endpointName)
kubeconfig := ""
if kubeconfig := u.Query().Get("kubeconfig"); kubeconfig != "" {
_ = os.Setenv(clientcmd.RecommendedConfigPathEnvVar, kubeconfig)
}
rules := clientcmd.NewDefaultClientConfigLoadingRules()
/*
If the content retrieved from the current environment variable is inconsistent with the obtained one,
it might indicate the presence of multiple kubeconfig files.
In such cases, it's advisable to directly retrieve the configuration from the file."
*/
if os.Getenv(clientcmd.RecommendedConfigPathEnvVar) != kubeconfig {
logrus.Debug("using kube config from file")
rules.ExplicitPath = kubeconfig
}
apiConfig, err := rules.Load()
if err != nil {
return nil, err
}
return clientcmd.NewDefaultClientConfig(*apiConfig, &clientcmd.ConfigOverrides{}), nil
}
return ConfigFromContext(endpointName, s)
}
github.com/docker/buildx v0.7.0 f002608
Quite an old release, do you repro on latest stable v0.14.0 as well?
Can you also show the output of docker buildx inspect <name>. It should display the kubeconfig used in Endpoint field for each node like: https://github.com/docker/buildx/actions/runs/9079420760/job/24948667867#step:9:29
Name: buildx-test-4c972a3f9d369614b40f28a281790c7e
Driver: kubernetes
Last Activity: 2024-05-14 12:36:40 +0000 UTC
Nodes:
Name: buildx-test-4c972a3f9d369614b40f28a281790c7e0
Endpoint: kubernetes:///buildx-test-4c972a3f9d369614b40f28a281790c7e?deployment=buildkit-4c2ed3ed-970f-4f3d-a6df-a4fcbab4d5cf-d9d73&kubeconfig=%2Ftmp%2Finstall-k3s-action%2Fkubeconfig.yaml
Driver Options: image="moby/buildkit:buildx-stable-1" qemu.install="true"
Status: running
BuildKit daemon flags: --allow-insecure-entitlement=network.host
BuildKit version: v0.13.2
Platforms: linux/amd64*
github.com/docker/buildx v0.7.0 f002608
Quite an old release, do you repro on latest stable v0.14.0 as well?
Can you also show the output of
docker buildx inspect <name>. It should display the kubeconfig used inEndpointfield for each node like: https://github.com/docker/buildx/actions/runs/9079420760/job/24948667867#step:9:29Name: buildx-test-4c972a3f9d369614b40f28a281790c7e Driver: kubernetes Last Activity: 2024-05-14 12:36:40 +0000 UTC Nodes: Name: buildx-test-4c972a3f9d369614b40f28a281790c7e0 Endpoint: kubernetes:///buildx-test-4c972a3f9d369614b40f28a281790c7e?deployment=buildkit-4c2ed3ed-970f-4f3d-a6df-a4fcbab4d5cf-d9d73&kubeconfig=%2Ftmp%2Finstall-k3s-action%2Fkubeconfig.yaml Driver Options: image="moby/buildkit:buildx-stable-1" qemu.install="true" Status: running BuildKit daemon flags: --allow-insecure-entitlement=network.host BuildKit version: v0.13.2 Platforms: linux/amd64*
"Yes, the issue persists in version 0.14 as well."
"The reason for this issue is that when using multiple kubeconfig files, the environment variable settings may be overwritten by subsequent settings, resulting in an incorrect ClientConfig being returned."
github.com/docker/buildx v0.7.0 f002608
Quite an old release, do you repro on latest stable v0.14.0 as well?
Can you also show the output of
docker buildx inspect <name>. It should display the kubeconfig used inEndpointfield for each node like: https://github.com/docker/buildx/actions/runs/9079420760/job/24948667867#step:9:29Name: buildx-test-4c972a3f9d369614b40f28a281790c7e Driver: kubernetes Last Activity: 2024-05-14 12:36:40 +0000 UTC Nodes: Name: buildx-test-4c972a3f9d369614b40f28a281790c7e0 Endpoint: kubernetes:///buildx-test-4c972a3f9d369614b40f28a281790c7e?deployment=buildkit-4c2ed3ed-970f-4f3d-a6df-a4fcbab4d5cf-d9d73&kubeconfig=%2Ftmp%2Finstall-k3s-action%2Fkubeconfig.yaml Driver Options: image="moby/buildkit:buildx-stable-1" qemu.install="true" Status: running BuildKit daemon flags: --allow-insecure-entitlement=network.host BuildKit version: v0.13.2 Platforms: linux/amd64*
/ # buildx inspect builder-test-xqptat
Name: builder-test-xqptat
Driver: kubernetes
Last Activity: 2024-05-20 06:08:11 +0000 UTC
Nodes:
Name: builder-test-amd64-xqptat
Endpoint: kubernetes:///builder-test-xqptat?deployment=builder-test-amd64-xqptat&kubeconfig=%2Fk8s-config-x86
Driver Options: image="buildkit:0.9.0-c" namespace="buildx-builder" nodeselector="pipeline=buildx-builder"
Status: running
BuildKit daemon flags: --allow-insecure-entitlement=network.host
Platforms: linux/amd64*, linux/386
Labels:
org.mobyproject.buildkit.worker.executor: oci
org.mobyproject.buildkit.worker.hostname: builder-test-amd64-xqptat-744d795976-kt6s6
org.mobyproject.buildkit.worker.snapshotter: overlayfs
GC Policy rule#0:
All: false
Filters: type==source.local,type==exec.cachemount,type==source.git.checkout
Keep Duration: 48h0m0s
Keep Bytes: 488.3MiB
GC Policy rule#1:
All: false
Keep Duration: 1440h0m0s
Keep Bytes: 138.8GiB
GC Policy rule#2:
All: false
Keep Bytes: 138.8GiB
GC Policy rule#3:
All: true
Keep Bytes: 138.8GiB
Name: builder-test-arm64-xqptat
Endpoint: kubernetes:///builder-test-xqptat?deployment=builder-test-arm64-xqptat&kubeconfig=%2Fk8s-config-arm
Driver Options: image="buildkit:0.9.0-c-arm64" namespace="buildx-builder" nodeselector="buildx-builder=arm64"
Status: running
BuildKit daemon flags: --allow-insecure-entitlement=network.host
Platforms: linux/arm64*
Labels:
org.mobyproject.buildkit.worker.executor: oci
org.mobyproject.buildkit.worker.hostname: builder-test-arm64-xqptat-678467865-rgscd
org.mobyproject.buildkit.worker.snapshotter: overlayfs
Looking at the output it seems to take into account the right config:
Name: builder-test-amd64-xqptat
Endpoint: kubernetes:///builder-test-xqptat?deployment=builder-test-amd64-xqptat&kubeconfig=%2Fk8s-config-x86
kubeconfig=%2Fk8s-config-x86
Name: builder-test-arm64-xqptat
Endpoint: kubernetes:///builder-test-xqptat?deployment=builder-test-arm64-xqptat&kubeconfig=%2Fk8s-config-arm
kubeconfig=%2Fk8s-config-arm
I will make some tests on my side and keep you posted.
Looking at the output it seems to take into account the right config:
Name: builder-test-amd64-xqptat Endpoint: kubernetes:///builder-test-xqptat?deployment=builder-test-amd64-xqptat&kubeconfig=%2Fk8s-config-x86
kubeconfig=%2Fk8s-config-x86Name: builder-test-arm64-xqptat Endpoint: kubernetes:///builder-test-xqptat?deployment=builder-test-arm64-xqptat&kubeconfig=%2Fk8s-config-arm
kubeconfig=%2Fk8s-config-armI will make some tests on my side and keep you posted.
Yes, the problem is not here. In the function of my screenshot above, because every time you get kubeconfig, it is from the environment variable. When there are multiple kubeconfig, because it is a concurrent loop, it will lead to an acquisition error.
@gitfxx Can you try with changes from https://github.com/docker/buildx/pull/2497?
@gitfxx Can you try with changes from #2497?
ok,i will try. thanks.
@gitfxx Can you try with changes from #2497?
ok,i will try. thanks.
@crazy-max I tested,it's fixd.