nflow icon indicating copy to clipboard operation
nflow copied to clipboard

X of X state processor threads are potentially stuck (processing longer than 60 seconds)

Open jansymphony opened this issue 2 years ago • 1 comments

2org.springframework.dao.EmptyResultDataAccessException: Incorrect result size: expected 1, actual 0
	at org.springframework.dao.support.DataAccessUtils.nullableSingleResult(DataAccessUtils.java:97) ~[spring-tx-5.3.16.jar:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:880) ~[spring-jdbc-5.3.16.jar:5.3.16]
	at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:906) ~[spring-jdbc-5.3.16.jar:5.3.16]
	at io.nflow.engine.internal.dao.WorkflowInstanceDao.updateWorkflowInstanceWithCTE(WorkflowInstanceDao.java:430) ~[nflow-engine-7.4.0.jar:na]
	at io.nflow.engine.internal.dao.WorkflowInstanceDao.updateWorkflowInstanceAfterExecution(WorkflowInstanceDao.java:326) ~[nflow-engine-7.4.0.jar:na]
	at io.nflow.engine.internal.dao.WorkflowInstanceDao$$FastClassBySpringCGLIB$$f1fc6e3.invoke(<generated>) ~[nflow-engine-7.4.0.jar:na]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.16.jar:5.3.16]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689) ~[spring-aop-5.3.16.jar:5.3.16]
	at io.nflow.engine.internal.dao.WorkflowInstanceDao$$EnhancerBySpringCGLIB$$f4fce4d3.updateWorkflowInstanceAfterExecution(<generated>) ~[nflow-engine-7.4.0.jar:na]
	at io.nflow.engine.internal.executor.WorkflowStateProcessor.persistWorkflowInstanceState(WorkflowStateProcessor.java:332) ~[nflow-engine-7.4.0.jar:na]
	at io.nflow.engine.internal.executor.WorkflowStateProcessor.saveWorkflowInstanceState(WorkflowStateProcessor.java:300) ~[nflow-engine-7.4.0.jar:na]
	at io.nflow.engine.internal.executor.WorkflowStateProcessor.runImpl(WorkflowStateProcessor.java:201) ~[nflow-engine-7.4.0.jar:na]
	at io.nflow.engine.internal.executor.WorkflowStateProcessor.run(WorkflowStateProcessor.java:129) ~[nflow-engine-7.4.0.jar:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

2022-03-15 13:52:20.826 ERROR 19087 --- [flow-executor-7] i.n.e.i.executor.WorkflowStateProcessor  : Failed to save workflow instance 428 new state, retrying after PT60S seconds.

This error is in loop. And this is the code that creates the loop

    while (true) {
      try {
        return persistWorkflowInstanceState(execution, instance.stateVariables, actionBuilder, instanceBuilder);
      } catch (Exception ex) {
        if (shutdownRequested.get()) {
          logger.error(
              "Failed to save workflow instance {} new state, not retrying due to shutdown request. The state will be rerun on recovery.",
              instance.id, ex);
          // return the original instance since persisting failed
          return instance;
        }
        StateSaveExceptionHandling handling = stateSaveExceptionAnalyzer.analyzeSafely(ex, saveRetryCount++);
        if (handling.logStackTrace) {
          nflowLogger.log(logger, handling.logLevel, "Failed to save workflow instance {} new state, retrying after {} seconds.",
              new Object[] { instance.id, handling.retryDelay, ex });
        } else {
          nflowLogger.log(logger, handling.logLevel,
              "Failed to save workflow instance {} new state, retrying after {} seconds. Error: {}",
              new Object[] { instance.id, handling.retryDelay, ex.getMessage() });
        }
        sleepIgnoreInterrupted(handling.retryDelay.getStandardSeconds());
      }
    }

The record is existing in the database , and if I run new instance it will be processed. But is there any way to break this loop ? or to restart the dispatcher/executioner ?

jansymphony avatar Mar 15 '22 13:03 jansymphony

Thank you for the report. I will try to have a look in the next few days on what could be the root cause. I'm afraid there is (currently) no way to break out of that loop, it is meant to protect and retry if database is unavailable to make sure changes done to workflow instances are not lost.

Not that it likely matters, but which version of postgresql are you using?

gmokki avatar Mar 21 '22 10:03 gmokki

Closing this, please reopen if you can reproduce this or you have other new information about the issue.

efonsell avatar Mar 18 '23 17:03 efonsell