Workflow Status not mark as completed on Listing page
Open
vishal079
opened this issue 9 months ago
•
7 comments
Describe the bug
I have started a workflow with a random duration of 10 to 15 days. It will check after a random number of days and perform the required process. Additionally, I have added an end date and a condition to check if the current date is greater than the end date; if so, the workflow will be marked as completed.
Now, as per the workflow definition, it works and the workflow is completed. It shows as 'Completed' on the detail screen, but on the listing screen, it still shows as 'Running.'
I have the same Problem on the Workflow AND Task level when searching.
In the Search, lots of workflows say they are "IN PROGRESS" but when you click on the details page it says "COMPLETED".
Even if there are no workflows running anymore, there are in the meantime still >10000 TASKS in Progress which doesnt make sense at all.
The workflow is pretty big, has some fork joins on multiple levels and runs up to 20 times in parallel.
The more parallel workflows, the worse the problem seems to get.
Edit: Screenshots
As you can see, the "TERMINATE" Tasks get stuck, which maybe also causes the Workflows to not complete ?
I am not sure. But this is getting really frustrating for me.
+1 Same problem. Very frustrating. Wanted to build a safe shutdown for my whole system that waits for all workflows to complete before shutting down. Can't do it because of this bug.
Oh also, when manually trying to Terminate the workflows which are stuck, it causes an Error:
Cannot terminate a COMPLETED workflow.
Here is the related stack-trace from conductor server:
18580451 [http-nio-8080-exec-23] ERROR com.netflix.conductor.service.WorkflowBulkService [] - bulk terminate exception, workflowId 03e70757-3154-44c8-b456-14ebd7b97e7a, message: Cannot terminate a COMPLETED workflow.
com.netflix.conductor.core.exception.ConflictException: Cannot terminate a COMPLETED workflow.
at com.netflix.conductor.core.execution.WorkflowExecutor.terminateWorkflow(WorkflowExecutor.java:577) ~[conductor-core.jar!/:?]
at com.netflix.conductor.service.WorkflowBulkServiceImpl.terminate(WorkflowBulkServiceImpl.java:154) [conductor-core.jar!/:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) [conductor-es7-persistence.jar!/:?]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196) [conductor-es7-persistence.jar!/:?]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) [conductor-es7-persistence.jar!/:?]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:751) [conductor-es7-persistence.jar!/:?]
at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:141) [conductor-es7-persistence.jar!/:?]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) [conductor-es7-persistence.jar!/:?]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:751) [conductor-es7-persistence.jar!/:?]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:703) [conductor-es7-persistence.jar!/:?]
at com.netflix.conductor.service.WorkflowBulkServiceImpl$$SpringCGLIB$$0.terminate() [conductor-core.jar!/:?]
at com.netflix.conductor.rest.controllers.WorkflowBulkResource.terminate(WorkflowBulkResource.java:112) [conductor-rest.jar!/:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205) [spring-web-6.0.12.jar!/:6.0.12]
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150) [spring-web-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:884) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1081) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:974) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1011) [spring-webmvc-6.0.12.jar!/:6.0.12]
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914) [spring-webmvc-6.0.12.jar!/:6.0.12]
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590) [tomcat-embed-core-10.1.13.jar!/:?]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885) [spring-webmvc-6.0.12.jar!/:6.0.12]
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) [tomcat-embed-websocket-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?]
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) [spring-web-6.0.12.jar!/:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?]
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) [spring-web-6.0.12.jar!/:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?]
at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109) [spring-web-6.0.12.jar!/:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) [spring-web-6.0.12.jar!/:6.0.12]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) [spring-web-6.0.12.jar!/:6.0.12]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:341) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:894) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1740) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) [tomcat-embed-core-10.1.13.jar!/:?]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-10.1.13.jar!/:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
I noticed that Elasticsearch is probably the cause of this.
When you query in the Web UI, it uses ES to search. However, when you open a workflow's details, it apparently also uses the persistent layer (PostgreSQL) to show you the information. This can cause the discrepancy seen.
For some reason, an "add index request" from Conductor to ES sometimes fails.
Then, the workflow status doesn't update, and the request doesn't retry, even though it's actually updated in the persistence layer (PostgreSQL in my case).
Turning on "async indexing" and "locking" seems to help with this problem. In almost 48 hours, there have been no more "Running" stuck workflows. However, we will have to see over a longer period of time to confirm this.