Markus Thömmes
Markus Thömmes
Here's some more info on the container above. Note how the container is in STOPPED state (and it's runsc and runsc-gofer processes are gone) but the sandbox with all the...
To not clutter things too much, here's the state of the "anecdotal" container I was talking about above: https://gist.github.com/markusthoemmes/7ebf064b44b1a182f552fbe1cc1b9150. I've tried to gather as much info as I could think...
Thanks @ayushr2 and @avagin for taking a look, it's greatly appreciated! Would the new hypothesis also explain the pod stuck like mentioned in https://github.com/google/gvisor/issues/9834#issuecomment-1879040410? It seems like we're having both...
I've looked at comparing differences of the shutdown behavior of "normally behaving" pods and hanging pods today to poke at this some more. Interestingly, the line that's missing in the...
Aaaaand one more datapoint in my quest to find the nugget of info that I'm looking for: The goroutine dump of the "leaked" shim: goroutine dump of leaked shim ```...
I've made some progress on this: I was able to reproduce this sortakinda reliably (not reliable on my dev clusters yet) by force-exceeding the ephemeral-storage in the respective pod. The...
Thanks @ayushr2. Here we go, I've gotten it to reproduce with the below PodSpec. It's been a little finicky so I left seemingly unrelated parts in it as well. FWIW,...
This is from an "unconfirmed" case (unconfirmed in that I can't actually see what the app itself is doing, but I can confirm that the symptom very much looks like...
Another case. This time "half confirmed" in that it's stuck in `Terminating` + stuck at 100% CPU (the case above was just stuck at 100% CPU). dump registers ``` 0x00007ff4c60bdea9...
And because third time's the charm and this time I'm really quite confident that no actual load is running in the container because the logs state that it's supposedly shut...