kcp icon indicating copy to clipboard operation
kcp copied to clipboard

bug: Possible to schedule a workspace onto a just-deleted Shard

Open ncdc opened this issue 2 years ago • 3 comments

Describe the bug

Due to a timing issue, it's sometimes possible to schedule a Workspace onto a just-deleted Shard

Steps To Reproduce

  1. Delete a Shard
  2. Immediately create a Workspace
  3. Workspace is sometimes (rarely) scheduled to the just-deleted Shard

Expected Behaviour

Workspace is not scheduled to just-deleted Shard

Additional Context

This was originally seen in #2603. That test deletes the root shard, creates a workspace, and expects the workspace to remain unscheduled until a new shard is created. This happens (rarely) when the shard informer's cache is out of date. While the shard is legitimately deleted from etcd, the informer hasn't seen the deletion yet, and the workspace reconciler proceeds to schedule onto the shard.

ncdc avatar Jan 26 '23 20:01 ncdc

Do we need to add a live GET after we choose a shard from the cache?

stevekuznetsov avatar Jan 27 '23 13:01 stevekuznetsov

That and/or a way to prevent the deletion of a shard if it has workspaces scheduled to it

ncdc avatar Jan 27 '23 13:01 ncdc

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kcp-ci-bot avatar Apr 15 '24 20:04 kcp-ci-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kcp-ci-bot avatar May 15 '24 20:05 kcp-ci-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kcp-ci-bot avatar Jun 14 '24 20:06 kcp-ci-bot

@kcp-ci-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kcp-ci-bot avatar Jun 14 '24 20:06 kcp-ci-bot