volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Deleting a queue with `kubectl` fails with "only queue with state `Closed` can be deleted"

Open prokher opened this issue 2 years ago • 31 comments

What happened:

An attempt to delete a queue with kubectl delete ends with the following error:

Error from server: error when deleting "ex2-volcano.yaml": admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `one-q` state is `Open`
Error from server (NotFound): error when deleting "ex2-volcano.yaml": jobs.batch.volcano.sh "j1-vj" not found

What you expected to happen:

I expect queue be deleted.

How to reproduce it (as minimally and precisely as possible):

Just create a new queue and try to delete it with kubectl delete.

Anything else we need to know?:

I realize I can close the queue first with vcctl, but I need a pure Kubernetes solution without extra CLI tools. I am going to eventually create/delete queues through Kubernetes API so I cannot rely on some external CLI utilities. I tried to edit a queue with kubectl edit but unfortunately could not close it this way.

Environment:

  • Volcano Version: 1.3.0
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:56:19Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
    

prokher avatar Aug 19 '21 19:08 prokher

As the design of queue, only queues with closed status, which means it will no longer receive new coming jobs, can be deleted. queue is a concept inheriting from traditional HPC scheduling architectures and widely used for cluster resource sharing and division among departments or groups. As to default scheduler of Kubernets, it's a scheduling for common scenarios. So it contains no queue.

Thor-wl avatar Aug 20 '21 03:08 Thor-wl

@Thor-wl , thank you, that is clear. The issues is that I do not know how to close a queue with kubectl without using vcctl. I assume if I can do it with kubectl, then I will be able to close a queue through the Kubernetes API.

prokher avatar Aug 20 '21 07:08 prokher

@Thor-wl , thank you, that is clear. The issues is that I do not know how to close a queue with kubectl without using vcctl. I assume if I can do it with kubectl, then I will be able to close a queue through the Kubernetes API.

Users can only close queue with vcctl now. Queue is not a native concept for Kubernetes. So there's no way to use K8s CLI to do that.

Thor-wl avatar Aug 20 '21 08:08 Thor-wl

@Thor-wl, understood. Can I expect that there will be a possibility to close a queue with kubectl edit? I realize that queue is not Kubernetes native concept. But I believe it is added a s custom resource (through CRD) so kubectl edit can modify it. I thought it is a Kubernetes way of doing things. User modifies Kubernetes object (e.g., through kubectl edit) and operator responsible for this object does it best to move it closer to the desired state.

prokher avatar Aug 20 '21 09:08 prokher

@prokher Yes. Perhaps it's hard to push this idea directly to Kubernetes community, but it is possible to develop a CLI plugin to support that.

Thor-wl avatar Aug 24 '21 03:08 Thor-wl

Actually you can apply a Command crd provided by volcano to do that, vcctl is doing this way. But l believe the community has very little description of this function currently.

shinytang6 avatar Aug 24 '21 11:08 shinytang6

@Thor-wl, honestly, I am not getting why you are speaking about pushing ideas to the Kubernetes community. Although it could be true, I was talking about a completely different thing.

I had in mind the possibility to edit a queue instance with kubectl edit and modify its YAML so that it becomes closed eventually. In such case, I could delete it with kubectl delete then. For example, there could be a property enabled switching which to false causes a queue to close.

@shinytang6, applying a Command CRD with kubectl may work as a workaround. Would you mind sharing a place I can find an example?

prokher avatar Aug 25 '21 17:08 prokher

@Thor-wl, honestly, I am not getting why you are speaking about pushing ideas to the Kubernetes community. Although it could be true, I was talking about a completely different thing.

I had in mind the possibility to edit a queue instance with kubectl edit and modify its YAML so that it becomes closed eventually. In such case, I could delete it with kubectl delete then. For example, there could be a property enabled switching which to false causes a queue to close.

@shinytang6, applying a Command CRD with kubectl may work as a workaround. Would you mind sharing a place I can find an example?

@prokher There are no relevant examples in the community right now, l can help add some later. You can try with something like that:

apiVersion: bus.volcano.sh/v1alpha1
kind: Command
metadata:
  name: close-queue
action: CloseQueue
target:
  apiVersion: scheduling.volcano.sh/v1beta1
  kind: Queue
  name: xxx
  uid: xxx

shinytang6 avatar Aug 26 '21 07:08 shinytang6

@shinytang6 Thank you for the recipe, this indeed worked! Now I can create a queue, submit jobs and delete the queue through the kubectl invocation.

I would anyway recommend to add a field to close a queue by kubectl edit <queue>. So I keep the ticket opened, it is up to maintainers to decide whether to close it or to keep for the case of implementing this.

prokher avatar Aug 27 '21 10:08 prokher

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Nov 25 '21 18:11 stale[bot]

The issues is still here AFAIK.

prokher avatar Nov 26 '21 10:11 prokher

The issues is still here AFAIK.

Emm, so what's the target solution/behaviour?

k82cn avatar Nov 26 '21 11:11 k82cn

I would expect I can delete a queue through kubectl delete. If it is necessary to close it first — OK, I would expect to use kubectl edit for this and then delete it with kubectl delete.

prokher avatar Nov 26 '21 11:11 prokher

I would expect I can delete a queue through kubectl delete. If it is necessary to close it first — OK, I would expect to use kubectl edit for this and then delete it with kubectl delete.

If so, let's have a kubectl plugin for volcano :)

k82cn avatar Nov 26 '21 23:11 k82cn

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Feb 27 '22 01:02 stale[bot]

It is still needed

prokher avatar Feb 28 '22 22:02 prokher

It is still needed

I agree with that. It will be more friendly for users to operate the objects.

Thor-wl avatar Mar 01 '22 01:03 Thor-wl

/assign @hwdef

hwdef avatar Mar 01 '22 01:03 hwdef

What if we rename vcctl to kubectl-volcano and use it as a plugin for kubectl

we can close queue by

kubectl volcano queue operate -a close -n test  

then, delete the queue

kubectl volcano queue delete -n test 

If this is cumbersome, we can optimize the command

kubectl volcano close queue --name test 
kubectl volcano delete queue --name test

hwdef avatar Mar 01 '22 02:03 hwdef

@prokher @k82cn @Thor-wl

hwdef avatar Mar 01 '22 02:03 hwdef

What if we rename vcctl to kubectl-volcano and use it as a plugin for kubectl

we can close queue by

kubectl volcano queue operate -a close -n test  

then, delete the queue

kubectl volcano queue delete -n test 

If this is cumbersome, we can optimize the command

kubectl volcano close queue --name test 
kubectl volcano delete queue --name test

Perhaps we can take different designs to the weekly meeting this week. It is an interesting issue. Let's get more voice from the community.

Thor-wl avatar Mar 01 '22 03:03 Thor-wl

In our scenario, we will operate Volcano from the node which does not have vcctl installed. Moreover, some parts of the system will work through the Kubernetes REST API, which also knows nothing about vcctl. I already shared my considerations above:

I would expect I can delete a queue through kubectl delete. If it is necessary to close it first — OK, I would expect to use kubectl edit and then delete it with kubectl delete.

Kubernetes seems to follow the declarative approach of describing its objects/resources. IMHO, to be declarative, the queue should have a boolean property open, which client can switch to false with regular kubectl edit.

prokher avatar Mar 01 '22 10:03 prokher

@prokher I see. You are more expecting to manage the queue only through kubectl. We'll discuss this at our community meeting.

hwdef avatar Mar 02 '22 01:03 hwdef

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar May 31 '22 03:05 stale[bot]

I still need this.

prokher avatar May 31 '22 08:05 prokher

The current situation is that the queue can be deleted even if it is not closed, but the vcjobs in the queue continue to exist, we need to improve this and I have two options in mind.

  1. cascade the deletion of vcjobs when the queue is deleted, regardless of the state of the vcjobs under this queue, analogous to namespace
  2. set a new field for the queue, such as spce.isOpen, by modifying this field, change the open and closed state of the queue, while restoring the previous webhook logic, the queue is not allowed to be deleted if it is not closed

What do you think? @prokher @Thor-wl @k82cn @william-wang @shinytang6

hwdef avatar Jun 17 '22 06:06 hwdef

@hwdef, IMHO seems reasonable, thank you for update.

prokher avatar Jun 17 '22 12:06 prokher

@hwdef, IMHO seems reasonable, thank you for update.

Which of these two options do you prefer?

hwdef avatar Jun 17 '22 13:06 hwdef

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Sep 20 '22 19:09 stale[bot]

@hwdef, IMHO seems reasonable, thank you for update.

Which of these two options do you prefer?

Actually any of both a quite OK from my point of view.

prokher avatar Sep 20 '22 21:09 prokher