kcp
kcp copied to clipboard
bug: Possible to schedule a workspace onto a just-deleted Shard
Describe the bug
Due to a timing issue, it's sometimes possible to schedule a Workspace onto a just-deleted Shard
Steps To Reproduce
- Delete a Shard
- Immediately create a Workspace
- Workspace is sometimes (rarely) scheduled to the just-deleted Shard
Expected Behaviour
Workspace is not scheduled to just-deleted Shard
Additional Context
This was originally seen in #2603. That test deletes the root shard, creates a workspace, and expects the workspace to remain unscheduled until a new shard is created. This happens (rarely) when the shard informer's cache is out of date. While the shard is legitimately deleted from etcd, the informer hasn't seen the deletion yet, and the workspace reconciler proceeds to schedule onto the shard.
Do we need to add a live GET after we choose a shard from the cache?
That and/or a way to prevent the deletion of a shard if it has workspaces scheduled to it
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close
@kcp-ci-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.