conductor icon indicating copy to clipboard operation
conductor copied to clipboard

Workflow Status not mark as completed on Listing page

Open vishal079 opened this issue 9 months ago • 7 comments

Describe the bug I have started a workflow with a random duration of 10 to 15 days. It will check after a random number of days and perform the required process. Additionally, I have added an end date and a condition to check if the current date is greater than the end date; if so, the workflow will be marked as completed.

Now, as per the workflow definition, it works and the workflow is completed. It shows as 'Completed' on the detail screen, but on the listing screen, it still shows as 'Running.'

Details Conductor version: 3.11.1 Persistence implementation: Postgres, Elastic Search Queue implementation: Postgres Lock: Zookeeper

Expected behavior It should show 'Completed' in both the listing and details pages.

Screenshots

Image

Image

vishal079 avatar Apr 08 '25 13:04 vishal079

Hi @vishal079 will check on this.

v1r3n avatar Apr 08 '25 22:04 v1r3n

Any Update?

vishal079 avatar Jun 02 '25 05:06 vishal079

+1 Same problem here.

I have the same Problem on the Workflow AND Task level when searching.

In the Search, lots of workflows say they are "IN PROGRESS" but when you click on the details page it says "COMPLETED".

Even if there are no workflows running anymore, there are in the meantime still >10000 TASKS in Progress which doesnt make sense at all.

The workflow is pretty big, has some fork joins on multiple levels and runs up to 20 times in parallel. The more parallel workflows, the worse the problem seems to get.

Edit: Screenshots

Image

As you can see, the "TERMINATE" Tasks get stuck, which maybe also causes the Workflows to not complete ?

I am not sure. But this is getting really frustrating for me.

Image

hexxone avatar Jun 06 '25 15:06 hexxone

+1 Same problem. Very frustrating. Wanted to build a safe shutdown for my whole system that waits for all workflows to complete before shutting down. Can't do it because of this bug.

@v1r3n any info?

tditz avatar Jun 06 '25 15:06 tditz

@vishal079 @hexxone @tditz Sorry for the delay. I've brought this up with our conductor dev team.

jeffbulltech avatar Jun 06 '25 17:06 jeffbulltech

Oh also, when manually trying to Terminate the workflows which are stuck, it causes an Error:

Cannot terminate a COMPLETED workflow.

Here is the related stack-trace from conductor server:

18580451 [http-nio-8080-exec-23] ERROR com.netflix.conductor.service.WorkflowBulkService [] - bulk terminate exception, workflowId 03e70757-3154-44c8-b456-14ebd7b97e7a, message: Cannot terminate a COMPLETED workflow. com.netflix.conductor.core.exception.ConflictException: Cannot terminate a COMPLETED workflow. at com.netflix.conductor.core.execution.WorkflowExecutor.terminateWorkflow(WorkflowExecutor.java:577) ~[conductor-core.jar!/:?] at com.netflix.conductor.service.WorkflowBulkServiceImpl.terminate(WorkflowBulkServiceImpl.java:154) [conductor-core.jar!/:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:569) ~[?:?] at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) [conductor-es7-persistence.jar!/:?] at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196) [conductor-es7-persistence.jar!/:?] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) [conductor-es7-persistence.jar!/:?] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:751) [conductor-es7-persistence.jar!/:?] at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:141) [conductor-es7-persistence.jar!/:?] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) [conductor-es7-persistence.jar!/:?] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:751) [conductor-es7-persistence.jar!/:?] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:703) [conductor-es7-persistence.jar!/:?] at com.netflix.conductor.service.WorkflowBulkServiceImpl$$SpringCGLIB$$0.terminate() [conductor-core.jar!/:?] at com.netflix.conductor.rest.controllers.WorkflowBulkResource.terminate(WorkflowBulkResource.java:112) [conductor-rest.jar!/:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:569) ~[?:?] at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) [spring-web-6.0.12.jar!/:6.0.12] at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150) [spring-web-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:884) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1081) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:974) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1011) [spring-webmvc-6.0.12.jar!/:6.0.12] at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914) [spring-webmvc-6.0.12.jar!/:6.0.12] at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590) [tomcat-embed-core-10.1.13.jar!/:?] at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885) [spring-webmvc-6.0.12.jar!/:6.0.12] at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) [tomcat-embed-websocket-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?] at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) [spring-web-6.0.12.jar!/:6.0.12] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?] at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) [spring-web-6.0.12.jar!/:6.0.12] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?] at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109) [spring-web-6.0.12.jar!/:6.0.12] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?] at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) [spring-web-6.0.12.jar!/:6.0.12] at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:341) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:894) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1740) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) [tomcat-embed-core-10.1.13.jar!/:?] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-10.1.13.jar!/:?] at java.lang.Thread.run(Thread.java:840) [?:?]

maybe this helps? And thanks for bringing it up 👍

hexxone avatar Jun 06 '25 18:06 hexxone

I noticed that Elasticsearch is probably the cause of this. When you query in the Web UI, it uses ES to search. However, when you open a workflow's details, it apparently also uses the persistent layer (PostgreSQL) to show you the information. This can cause the discrepancy seen.

For some reason, an "add index request" from Conductor to ES sometimes fails. Then, the workflow status doesn't update, and the request doesn't retry, even though it's actually updated in the persistence layer (PostgreSQL in my case).

Turning on "async indexing" and "locking" seems to help with this problem. In almost 48 hours, there have been no more "Running" stuck workflows. However, we will have to see over a longer period of time to confirm this.

hexxone avatar Aug 17 '25 11:08 hexxone