camunda icon indicating copy to clipboard operation
camunda copied to clipboard

Undeploy Workflow

Open datoslabs opened this issue 6 years ago • 34 comments

Hi,

I would like to request undeploy or suspend workflow feature. I noticed that "Remove deployed workflows #2506" has been closed with the comment that currently there is no plan to implement a feature to remove or undeploy workflow; however, I believe I have a valid use case and would urge you to reconsider adding this feature. I deployed workflows with timered start event without timeout boundry event, as a result, new workflow instances are being spawned indefinitely. I tried to stop it by deploying a new version of the same workflow changing the timer start event to regular start event; unfortunately this did not work as it's treated as a different workflow even though the workflow Id is the same (ie I ended up with 2 workflows of the same Id, different version number and the original workflow with timered start event continues to spawn new instances). As you can imagine, this could lead to problems in production.

Regards,

Eric

datoslabs avatar Aug 01 '19 16:08 datoslabs

Hi Eric,

thank you for raising this up. Is it correct that you need to un-deploy workflows because deploying a new version doesn't stop the previous version from creating new instances?

This sounds like a bug. When a new workflow with the same BPMN process id is deployed then the timer of the previous start event should be canceled.

I tested it with Zeebe 0.20.0 and it works as expected. Can you please descript the steps to reproduce the issue, or providing a minimal reproducer code?

saig0 avatar Aug 02 '19 05:08 saig0

Hi, There are other cases for deleting deployed workflows. a. We created some customer specific workflows but a customer stopped using our services. There are at least two reasons for deleting these workflows for this case:

  1. Free up storage - we don't need to keep this customer workflows.
  2. The customer requested to delete all his data from our service. We have to comply with GDPR requirements. I think, this one is critical.

b. We implemented auto generation of workflows but this is one time workflow. We don't need to keep it after it was executed.

c. A certain workflow is updated frequently (by user or by automatic process), it creates a lot of versions that are not actually used.

sergeylebed avatar Sep 18 '19 15:09 sergeylebed

Here is another aspect to the undeploy scenario, and this will require a behavioural change in Operate:

  1. I deployed a workflow with the human-readable name "User signup".
  2. I changed the workflow id in a later iteration, then redeployed.

Outcome: I now have two workflows with the same identifier in Operate (see screenshot).

Screen Shot 2019-09-23 at 11 09 05 am

I want to get rid of one of them. If I were to delete (undeploy) one of these in Zeebe, the undeploy event could be processed by Operate to remove it from the UI, but then I would lose access to historical.

So undeployed workflows may need to be moved to an "Archived" (or similarly named) view in Operate.

jwulf avatar Sep 23 '19 01:09 jwulf

I have a further motivation for this. I am running integration tests that deploy workflows with templated unique ids, task types, and message names - to avoid state collisions between consecutive and parallel test runs.

This means that every test run I do creates a whole bunch of additional workflow definitions. And there is no way to delete them, which makes the cluster pretty much unusable for anything else.

It's not a big deal when running against dockerized ephemeral brokers in CI, but my Camunda Cloud cluster is a mess.....

jwulf avatar Mar 09 '20 15:03 jwulf

In addition to messiness and all the other very valid points made above, security is a concern here, especially in microservice orchestration scenarios.

Let's say that the business wants to discontinue a business process that has been modeled as a workflow definition. Without the ability to remove or at least disable workflow definitions, there is no way to guarantee that this process doesn't continue to be run.

DougAtPeddle avatar Mar 18 '20 22:03 DougAtPeddle

Well, strictly speaking: you could deploy a noop workflow with the same process id - that would stop anything from running, but you would not detect an "undeployed" workflow on the client with an exception.

jwulf avatar Mar 19 '20 07:03 jwulf

Solid idea, Josh. Thank you.

Thinking through this a bit more, there's also the "Executable" flag in the BPMN workflow definition. I'd need to test to confirm, but deploying a new workflow version with "Executable" unchecked should prevent unwanted subsequent execution of "retired" workflows.

DougAtPeddle avatar Mar 19 '20 14:03 DougAtPeddle

Nope. Deploying a new workflow version with "Executable" unchecked results in the following deployment error:

ERROR: Must contain at least one executable process

ERROR_ Must contain at least one executable process

DougAtPeddle avatar Apr 20 '20 18:04 DougAtPeddle

I hope this will get implemented.

Even if not in the near future, please keep this issue open.

Another request is to list workflows using zbctl, but that is another issue.

ceefour avatar Apr 21 '20 17:04 ceefour

We are considering Zeebe for a future system.

Part of the system will be black box to the user with workflows that do not change often. However we also require part of the system o have workflows that the user is able to tweak before starting to make sure that they match how they need to proceed for that particular instance. At the moment there is a model that assumes the normal case will be that the user makes some change to the workflow.

If we have many users a minute and a multi year system life span we assume that it will be important to be able to undeploy workflows. We want to protect both scalability and durability. The system should not fall over, slow down, timeout or otherwise degrade due to growing data structures. It should also avoid fragmentation type issues with however it implements deletion.

allanbarklie avatar Jun 09 '20 15:06 allanbarklie

Extract from https://docs.zeebe.io/basics/exporters.html#considerations:

Once data is not needed by Zeebe itself anymore, it will query its exporters to know if it can be safely deleted, and if so, will permanently erase it, thereby reducing disk usage.

It seems not to be true for workflows. The last version of a workflow will always be kept as I see no reason for Zeebe to delete it. I think that deploying an empty workflow is not clean in respect of the log event system of Zeebe.

Sending a "Delete Workflow" or "Disable Workflow" or "Undeploy Workflow" event is far more clean in respect of Zeebe philosophy. It reflects the desire of the user to stop using this workflow, and this desire appear in the log history.

vtexier avatar Jul 27 '20 07:07 vtexier

This should be really considered in the Roadmap for this year. It would complete the Zeebe experience for many and it's a highly asked feature.

avocraft avatar Aug 08 '20 23:08 avocraft

@avocraft @vtexier and all.. this feature is not planned for this Quarter, but we will re-evaluate early next year. The team current priority is stability and performance for Camunda Cloud. We highly appreciate the feedback and if you are trying out Camunda Cloud, please let us know what you think about it.

salaboy avatar Oct 16 '20 15:10 salaboy

Its "early next year" - any updates?

psteinroe avatar Mar 29 '21 14:03 psteinroe

@steinroe let me check if this made the planning for Q2 and I will get back to you

salaboy avatar Mar 29 '21 14:03 salaboy

@steinroe unfortunately this is not coming soon in Q2, but I will make sure that I raise this for Q3.

salaboy avatar Mar 29 '21 15:03 salaboy

I have an proposal to implement this, which is far from complete but might be a start. First thing to note: I think it makes more sense to say removing of a process model or undeploy of a process, not a deployment.

Be aware this is just a proposal and I’m happy for any feedback.

The distribution of the removement can probably, similar to the creation, distributed via the deployment partition. The deployment partition is in charge of making sure that all partitions received the deletion; if no ack was received it needs to redistribute it.

One issue we have seen regarding handling the process deletion, was the handling of existing process instances, which correspond to that deployment. One idea came to my mind recently, when I read about netty's quiet period on the shutdownGracefully method https://netty.io/4.1/api/io/netty/util/concurrent/GlobalEventExecutor.html#shutdownGracefully-long-long-java.util.concurrent.TimeUnit- .

We could accept a new command like this:


message DeleteProcess {
  // unique identifier of a specific process definition
  string bpmnProcessId = 1;
  // the assigned process version
  int32 version = 2;
  // the unique identifier of this process
  int64 processKey = 3;
  // the quiet period or await time until the process can be deleted - can be zero
  // if zero expects an non zero timeout
  int64 quitePeriod = 4;
 // the maximum timeout until the process is forced to be deleted - can be zero
 // if zero it is immediately forced
 // if -1 for unlimited expects an non zero quite timeout
  Int64 timeout = 5;
}

The user can decide whether it wants to delete the process via bpmn process id and version or process key, similar to process instance creation. We could also discuss whether it makes more sense to delete processes completely via bpmn process id, without version.

The deletion would respect the quiet period and deletes the process only if no longer instances are created or executed during this time. If the max timeout is reached then the deletion is forced. This means that all existing instances need to be canceled, before the process is deleted. Each partition needs to do the cancellation itself.

  • The quiet period can be zero, then the deletion is immediately forced.
  • The timeout can be zero or negative, then the max timeout is unlimited, which needs a quite period.

We could also further discuss whether we want to reject any new instance creations already after receiving the deletion command, which might make sense. Then we would only allow further executions until the max timeout is reached.

Happy to discuss this further. :)

ChrisKujawa avatar Aug 28 '21 21:08 ChrisKujawa

Just to clarify:

The quiet period is reset whenever a process instance is created; what does it mean executed? You mean if any processing happens to it (e.g. a Job.COMPLETE command, or any ProcessInstance command?)?

Maybe let's try an example to make sure we're on the same page. I have a quietPeriod=5s, and a timeout=60s. So after sending the delete command, if after 5 seconds nothing happened to any instances of that process and no new instances were created, the existing instances are canceled, and then the process is deleted. If a new instance was created after 4s, then the quiet period is reset, and I have to wait another 5s - meaning only after 9s are all instances canceled, and the process deleted. And if there's always stuff happening, say, every 2s, then after 60s regardless all instances are canceled and the process deleted. Correct?

Sounds good to me, but I would challenge if we need to include the condition about new instances. If I schedule deletion of a process, I would think as a user it makes sense to immediately reject new instances, but give a grace period to existing ones to complete before forcefully terminating them.

EDIT: oops, saw that your last paragraph says exactly that we could challenge that :sweat_smile:

npepinpe avatar Sep 13 '21 14:09 npepinpe

Hi everyone,

are there any updates regarding this feature? Our company is also running into issues and maintenance overhead in managing old processes which cannot be undeployed.

Would appreciate if the feature will be included in the planning process.

PhilippS93 avatar Dec 02 '21 11:12 PhilippS93

Hi guys,

Any news on this? I mean it should be at least possible to deactivate workflows somehow. I've created 2 workflows which's start events listen to the same message, but I'd like only one to fire, since the other one is outdated.

I'll deploy a dummy workflow for the time being, but that's not very elegant...

Thanks

morphace avatar Feb 24 '22 08:02 morphace

Unfortunately not, it's still on the back burner for now.

npepinpe avatar Feb 24 '22 10:02 npepinpe

cc @remcowesterhoud

pihme avatar May 31 '22 09:05 pihme

Hi there, in Q3 we'll be working on #9576 to release Delete Process and Decision Definition feature in Zeebe and Operate with 8.1.

Currently, we started to work on exploring and designing the process definition deletion topic. We are running a research project to understand end-users approach to the matter, context and needs so that the UI will be friendly and supportive. For this we are looking for participants among community members - those who have relevant experience, demand for the feature. We will appreciate your help by:

  • having a 30min Zoom call with us

@datoslabs @sergeylebed @DougAtPeddle @ceefour @allanbarklie @vtexier @avocraft @PhilippS93 @morphace would you be open to give us a bit of a context from your side during the quick Zoom session?

Thanks in advance:handshake: CC: @EvgeniyaUX

aleksander-dytko avatar Jul 13 '22 10:07 aleksander-dytko

Great news and initiative! But I am not using Zeebe anymore in my project. So it will be without me.

vtexier avatar Jul 13 '22 11:07 vtexier

Hi. Thanks for getting in touch. Our usage is still in the investigative/proof of concept stage - I have been championing the technology but our outcomes are still uncertain. So far we have been using Zeebe self hosted but again future usage may vary.

I'm just about to go on holiday so I thought I would summarize our undeploy use case for you.

It is likely that we will have some template workflows that form of the basis of workflows that users - or automated systems - or a hybrid will run. We are looking at possibilities for those workflows to sometimes be tweaked or optimised before execution. This might be manual or automated. Therefore there is potential for a build up of deployed workflows that will never again get executed. So the desire would be to be able to clean up these no longer needed workflows. Motivations include general concerns about resource/performance type issues and safety issues based on the current workflow definitions being in some way within a large number of old workflow definitions. We would imagine an API similar to that for deploying that would be able to undeploy.

allanbarklie avatar Jul 14 '22 16:07 allanbarklie

@allanbarklie thanks for providing us with the background!

Currently, we've assumed that with deletion of the version of process definition, all the process instances of this version of the process are deleted, along with called process instances and tasks. Would it fit into your expectation?

aleksander-dytko avatar Jul 15 '22 08:07 aleksander-dytko

@aleksander-dytko, I'm ok to have a session discussing our use cases. Basically speaking they are described in my initial reply above https://github.com/camunda/zeebe/issues/2908#issuecomment-532748378.

sergeylebed avatar Jul 15 '22 10:07 sergeylebed

@aleksander-dytko I think the deletion pattern you describe would be ok, I don't think it would cause issues for our use case. I think it is reasonable for our use case to assume any undeploy would be after such workflows had completed. Additionally any scope that has the capability to undeploy could for example also be expected to provide unique names for any definitions it owns - therefore avoiding race conditions or unexpected deletion issues elsewhere. I agree that ensuring directly launched workflows don't get orphaned also makes sense. Cleaning up the whole process tree sounds sensible.

Am I correct to assume that other workflows started by start messages fired from an instance of the deleted deployment would not be stopped as they would be outside of the deleted process scope? That would also make sense to me.

This design may also have a useful use case as a belts and braces clean up operation for instances. Similar to deleting a namespace in K8s.

allanbarklie avatar Jul 15 '22 10:07 allanbarklie

@allanbarklie Thanks for the validation!

Am I correct to assume that other workflows started by start messages fired from an instance of the deleted deployment would not be stopped as they would be outside of the deleted process scope? That would also make sense to me.

I'm not sure if I understand that fully. Do you mean if the running process instances of the deleted process definition would be deleted? If yes, the running process instances will be terminated and deleted, as there will be no process definition to execute/display.

aleksander-dytko avatar Jul 15 '22 11:07 aleksander-dytko

I think what @allanbarklie means is the case where there are 2 process definitions of which one contains a message start event. The second definition could then start new instances by sending messages.

E.g. image

In this case, deleting definition 1 will not result in cancelling any instances of definition 2, unlike what would happen when definition 2 was started by a call activity.

remcowesterhoud avatar Jul 15 '22 11:07 remcowesterhoud