Drew Erny
Drew Erny
Actually, we don't really need tests. There's not enough meat-and-potatoes in the actual code changes.
I'm taking a look at this issue now.
@s4ke I believe there are several issues afoot here: 1. There is some error causing volumes to attempt to be scheduled to invalid nodes. 2. There is some error resulting...
So, the open questions I have right now about the linked issue: The Volume is getting scheduled to a node which is outside of its availability constraint. This is odd....
That's exactly the issue I had in mind.
OK, for starters, I have figured out one problem. This is where we convert the gRPC response into Docker API objects: https://github.com/moby/moby/blob/b3843992fc12536908fea2fea3ece05725b1e613/daemon/cluster/convert/node.go#L59-L70 And this is the Docker API object in...
I think we'd need to see Docker daemon logs on the problematic nodes from around the time of the error to investigate this. I know that's a difficult ask because...
I see what the bug here is. I'm going to add it to my personal TODO list, because I think it should be an easy fix. The problem is definitely...
do you know if restarting the worker causes the logs to start working again? i'm not suggesting this as a workaround because restarting workers in prod is nontrivial, but if...
i think it's likely that the CA rotation is only making logs work again as a side effect. probably something about updating the node. if other things work and logs...