dra-example-driver icon indicating copy to clipboard operation
dra-example-driver copied to clipboard

Add support for Podman

Open empovit opened this issue 1 year ago • 20 comments

Add podman as an option for building images and running on Kind

empovit avatar Aug 07 '24 12:08 empovit

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: empovit / name: Vitaly E. (151463a6784b1192c02d37db60df345e06260ae3)

Welcome @empovit!

It looks like this is your first PR to kubernetes-sigs/dra-example-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/dra-example-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Aug 07 '24 12:08 k8s-ci-robot

@empovit what do you mean by:

NOTE: Currently, the dra-example-driver-kubeletplugin pod crashes if Kind runs as non-root/non-sudo.

Is this true for both docker and podman or just podman? How to reproduce on docker?

klueska avatar Aug 09 '24 10:08 klueska

@empovit what do you mean by:

NOTE: Currently, the dra-example-driver-kubeletplugin pod crashes if Kind runs as non-root/non-sudo.

Is this true for both docker and podman or just podman? How to reproduce on docker?

@klueska I've described the issue in https://kubernetes.slack.com/archives/C0409NGC1TK/p1721720213996169

$ kubectl logs -n dra-example-driver dra-example-driver-kubeletplugin-q5khp
Defaulted container "plugin" out of: plugin, init (init)
E0723 07:23:56.322197       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x18258e0, 0x2a0aed0})
        /build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0000061c0?})
        /build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x18258e0?, 0x2a0aed0?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/fsnotify/fsnotify.(*Watcher).isClosed(...)
        /build/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:176
github.com/fsnotify/fsnotify.(*Watcher).Add(0x0, {0xc000048009?, 0xc00051b4c8?})
        /build/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:240 +0x57
github.com/container-orchestrated-devices/container-device-interface/pkg/cdi.(*watch).update(0xc0001112d0, 0xc0001d6900, {0x0, 0x0, 0x51b720?})
        /build/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/cache.go:563 +0xd1
github.com/container-orchestrated-devices/container-device-interface/pkg/cdi.(*Cache).refreshIfRequired(0xc000023180, 0x8?)
        /build/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/cache.go:208 +0x38
github.com/container-orchestrated-devices/container-device-interface/pkg/cdi.(*Cache).Refresh(0xc000023180)
        /build/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/cache.go:123 +0xa6
main.NewCDIHandler(0x0?)
        /build/cmd/dra-example-kubeletplugin/cdi.go:46 +0x9e
main.NewDeviceState(0xc00051bbd0)
        /build/cmd/dra-example-kubeletplugin/state.go:66 +0x76
main.NewDriver.func1()
        /build/cmd/dra-example-kubeletplugin/driver.go:53 +0x85
k8s.io/client-go/util/retry.OnError.func1()
        /build/vendor/k8s.io/client-go/util/retry/util.go:51 +0x30
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x4c9379?)
        /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:145 +0x3e
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0xc00051b920)
        /build/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:461 +0x5a
k8s.io/client-go/util/retry.OnError({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x0?, 0xc0002ad6f8?)
        /build/vendor/k8s.io/client-go/util/retry/util.go:50 +0xa5
k8s.io/client-go/util/retry.RetryOnConflict(...)
        /build/vendor/k8s.io/client-go/util/retry/util.go:104
main.NewDriver({0x1d09b90, 0x2aa1e40}, 0xc00051bbd0)
        /build/cmd/dra-example-kubeletplugin/driver.go:42 +0x170
main.StartPlugin({0x1d09b90, 0x2aa1e40}, 0xc00051bbd0)
        /build/cmd/dra-example-kubeletplugin/main.go:139 +0x18b
main.newApp.func2(0xc0002abdc0?)
        /build/cmd/dra-example-kubeletplugin/main.go:113 +0x12e
github.com/urfave/cli/v2.(*Command).Run(0xc0003514a0, 0xc0002abdc0, {0xc000110070, 0x1, 0x1})
        /build/vendor/github.com/urfave/cli/v2/command.go:274 +0x93f
github.com/urfave/cli/v2.(*App).RunContext(0xc000032000, {0x1d09b90, 0x2aa1e40}, {0xc000110070, 0x1, 0x1})
        /build/vendor/github.com/urfave/cli/v2/app.go:332 +0x566
github.com/urfave/cli/v2.(*App).Run(...)
        /build/vendor/github.com/urfave/cli/v2/app.go:309
main.main()
        /build/cmd/dra-example-kubeletplugin/main.go:60 +0x3f
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x518e77]

goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0000061c0?})
        /build/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x18258e0?, 0x2a0aed0?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/fsnotify/fsnotify.(*Watcher).isClosed(...)
        /build/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:176
github.com/fsnotify/fsnotify.(*Watcher).Add(0x0, {0xc000048009?, 0xc00051b4c8?})
        /build/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:240 +0x57
github.com/container-orchestrated-devices/container-device-interface/pkg/cdi.(*watch).update(0xc0001112d0, 0xc0001d6900, {0x0, 0x0, 0x51b720?})
        /build/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/cache.go:563 +0xd1
github.com/container-orchestrated-devices/container-device-interface/pkg/cdi.(*Cache).refreshIfRequired(0xc000023180, 0x8?)
        /build/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/cache.go:208 +0x38
github.com/container-orchestrated-devices/container-device-interface/pkg/cdi.(*Cache).Refresh(0xc000023180)
        /build/vendor/github.com/container-orchestrated-devices/container-device-interface/pkg/cdi/cache.go:123 +0xa6
main.NewCDIHandler(0x0?)
        /build/cmd/dra-example-kubeletplugin/cdi.go:46 +0x9e
main.NewDeviceState(0xc00051bbd0)
        /build/cmd/dra-example-kubeletplugin/state.go:66 +0x76
main.NewDriver.func1()
        /build/cmd/dra-example-kubeletplugin/driver.go:53 +0x85
k8s.io/client-go/util/retry.OnError.func1()
        /build/vendor/k8s.io/client-go/util/retry/util.go:51 +0x30
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x4c9379?)
        /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:145 +0x3e
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0xc00051b920)
        /build/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:461 +0x5a
k8s.io/client-go/util/retry.OnError({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x0?, 0xc0002ad6f8?)
        /build/vendor/k8s.io/client-go/util/retry/util.go:50 +0xa5
k8s.io/client-go/util/retry.RetryOnConflict(...)
        /build/vendor/k8s.io/client-go/util/retry/util.go:104
main.NewDriver({0x1d09b90, 0x2aa1e40}, 0xc00051bbd0)
        /build/cmd/dra-example-kubeletplugin/driver.go:42 +0x170
main.StartPlugin({0x1d09b90, 0x2aa1e40}, 0xc00051bbd0)
        /build/cmd/dra-example-kubeletplugin/main.go:139 +0x18b
main.newApp.func2(0xc0002abdc0?)
        /build/cmd/dra-example-kubeletplugin/main.go:113 +0x12e
github.com/urfave/cli/v2.(*Command).Run(0xc0003514a0, 0xc0002abdc0, {0xc000110070, 0x1, 0x1})
        /build/vendor/github.com/urfave/cli/v2/command.go:274 +0x93f
github.com/urfave/cli/v2.(*App).RunContext(0xc000032000, {0x1d09b90, 0x2aa1e40}, {0xc000110070, 0x1, 0x1})
        /build/vendor/github.com/urfave/cli/v2/app.go:332 +0x566
github.com/urfave/cli/v2.(*App).Run(...)
        /build/vendor/github.com/urfave/cli/v2/app.go:309
main.main()
        /build/cmd/dra-example-kubeletplugin/main.go:60 +0x3f

I didn't test with rootless Docker as it's not common to my knowledge. IMO the problem would be best dealt with as a GitHub issue, but I can't open one without adding basic support for Podman first.

Tested combinations

  • Works with Docker on Ubuntu as a regular user
  • Works with Podman on RHEL as root user
  • Works with Podman on Fedora using sudo
  • Crashes with Podman on Fedora as a regular user

empovit avatar Aug 09 '24 14:08 empovit

@empovit I think that's a bug in the example driver, please create an issue for it and I will have a look at it on Monday. I think we need to update the way in which we write the CDI specifications.

I have created #52 which may address some of what you're seeing here.

elezar avatar Aug 09 '24 15:08 elezar

https://github.com/kubernetes-sigs/dra-example-driver/pull/52 has been merged. Please let me know if it has resolved the issue.

klueska avatar Aug 10 '24 08:08 klueska

Hmm, #52 fixes the issue on 1.30, but other changes break the Podman support :( This can't run because Kind's build has Docker hard-coded:

KUBE_GIT_VERSION=v1.30.0 BUILD_KIND_IMAGE=true KIND_K8S_TAG=v1.31.0-rc.1 ./demo/create-cluster.sh

empovit avatar Aug 14 '24 08:08 empovit

Glad to hear that the repo works as-is at least. Let's dig into the build failures and decide whether we can address them here or whether we need to propose changes to kind.

elezar avatar Aug 14 '24 11:08 elezar

So the issue is when building the image itself -- maybe I can / should change things to just have it pull from an image hosted in the github registry until an official kind image comes out.

klueska avatar Aug 14 '24 11:08 klueska

@elezar my understanding is that it's Kind/K8s policy to use only Docker, so proposing changes to Kind won't work.

@klueska I think that would be great. Alternatively, we can mention as a limitation that Podman works only with 1.30 until a 1.31 image is available.

empovit avatar Aug 14 '24 11:08 empovit

Actually -- it looks like they already published a 1.31.0 image yesterday. Let me just open a PR to update everything to that now.

https://hub.docker.com/layers/kindest/node/v1.31.0/images/sha256-919a65376fd11b67df05caa2e60802ad5de2fca250c9fe0c55b0dce5c9591af3?context=explore

klueska avatar Aug 14 '24 12:08 klueska

Done, please review: https://github.com/kubernetes-sigs/dra-example-driver/pull/53

klueska avatar Aug 14 '24 12:08 klueska

Thank you @klueska! I've rebased the PR on top of main and updated the README to include Podman. PTAL

empovit avatar Aug 14 '24 14:08 empovit

I don't see anything in the PR that tells kind itself to use podman.

I expected to see something like KIND_EXPERIMENTAL_PROVIDER=podman in the scripts somehwere...

klueska avatar Aug 14 '24 14:08 klueska

I don't see anything in the PR that tells kind itself to use podman.

I expected to see something like KIND_EXPERIMENTAL_PROVIDER=podman in the scripts somehwere...

It's auto-detected https://kind.sigs.k8s.io/docs/user/quick-start/

The kind can auto-detect the docker, podman, or nerdctl installed and choose the available one. If you want to turn off the auto-detect, use the environment variable KIND_EXPERIMENTAL_PROVIDER=docker, KIND_EXPERIMENTAL_PROVIDER=podman or KIND_EXPERIMENTAL_PROVIDER=nerdctl to select the runtime.

empovit avatar Aug 14 '24 15:08 empovit

OK, so I can override it with KIND_EXPERIMENTAL_PROVIDER if I want, but otherwise it will be autodetected?

klueska avatar Aug 14 '24 15:08 klueska

Correct.

empovit avatar Aug 14 '24 15:08 empovit

Can we instead explicitly set KIND_EXPERIMENTAL_PROVIDER to the value of CONTAINER_TOOL and instead autodetect CONTAINER_TOOL in this environment?

klueska avatar Aug 14 '24 15:08 klueska

Done. PTAL

empovit avatar Aug 16 '24 13:08 empovit

@empovit thanks for all the iterations on this. I've tested it on my side with both podman and docker and everything seems to work as expected.

/lgtm /approve

klueska avatar Aug 21 '24 09:08 klueska

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: empovit, klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 21 '24 09:08 k8s-ci-robot