Sean P. Kelly
Sean P. Kelly
My understanding of the bug is that there's nothing particular about `1.25.0` which makes this race condition -- I suspect it has been possible since the device plugin was introduced,...
Sorry for the delay here, the race condition was so rare in my testing environment that it was challenging to prove that we actually resolved the issue. After more thorough...
https://github.com/bottlerocket-os/bottlerocket-core-kit/pull/228 is merged, which should resolve this in an upcoming Bottlerocket release! --- @bcressey has done some great work looking into why `nvidia-k8s-device-plugin` restarts can sometimes lead to GPU resources...
After digging in, I found that I mistakenly failed to tag my fix into the `1.26.2` release branch, and thus was not released until Bottlerocket `1.27.0`. Apologies for the mixup....
I have attempted to replicate this behavior on `1.27.0` over hundreds of instance launches (on `g4dn.xlarge` and `g5.xlarge`) but have thus far been unsuccessful. Here's the Karpenter setup that I...
Thanks @cogentist-yann. I'll run my tests with the specific AMI you've mentioned as well. Otherwise, I *believe* this issue is resolved, but I'll leave it open for a while in...
After some further discussion, I think two unique settings may be less ambiguous: * The existing boolean `ignore-waves` * A new setting, `delay-waves` or `bake-time`. The semantics would be as...
I'll take a look. Do you mind sharing which version of Brupop this is using?
This configuration: ``` NAME STATE VERSION TARGET STATE TARGET VERSION CRASH COUNT $HOSTNAME StagedAndPerformedUpdate 1.20.2 RebootedIntoUpdate 1.20.3 0 ```` Means that your host "staged" the update. It's installed to a...
The interface that Brupop uses to interact with PDBs is that it makes an eviction request to the Kubernetes API, then that API responds specially depending on the state of...