cadence-java-client icon indicating copy to clipboard operation
cadence-java-client copied to clipboard

Timeout exception on IWorkflowService#ResetWorkflowExecution

Open polyansky-syberry opened this issue 5 years ago • 12 comments

Code (modified samples):



    public static void main(String[] args) throws TException, IOException {
        IWorkflowService cadenceService = new WorkflowServiceTChannel(
            "127.0.0.1",
            7933,
            new WorkflowServiceTChannel.ClientOptions.Builder()
                .setRpcTimeout(1_000_000L)
                .setListArchivedWorkflowRpcTimeout(1_000_000_000L)
                .setQueryRpcTimeout(1_000_000_000L)
                .setRpcLongPollTimeout(1_000_000_000L)
                .build()
        );
            System.out.println("---------------------------------------------------------------");
            System.out.println("Run for " + 4);
            ResetWorkflowExecutionRequest request = new ResetWorkflowExecutionRequest();
            request.setWorkflowExecution(
                new WorkflowExecution()
                    .setWorkflowId("f5e392e2-20ed-4239-9633-65a352fbd202")
                    .setRunId("5115e281-f48b-4f51-a3de-f1b9880677a3")
            );
            request.setDomain("DOMAIN");
            request.setDecisionFinishEventId(4);
            try {
                cadenceService.ResetWorkflowExecution(request);
                System.out.println("Success");
            } catch (Exception e) {
                LoggerFactory.getLogger("Logger").error("Error", e);
            }

        System.exit(0);
    }

What I get:

09:06:20.822 [main] ERROR Logger - Error
org.apache.thrift.transport.TTransportException: timeout
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:546)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:519)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1597)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$25(WorkflowServiceTChannel.java:1586)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:569)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1585)
	at com.uber.cadence.samples.common.RegisterDomain.main(RegisterDomain.java:65)

Through CLI everything works. Ahead of questions it is crucial for me to be capable of rerunning workflows programmatically to be able to do so under Spring.


It seems like cadence server stops the processing because timeout is not configured (in CLI we have --context_timeout option for that), but I mot sure it's true.

Can you help me with that?

polyansky-syberry avatar Nov 05 '20 06:11 polyansky-syberry

Docker-compose

version: '3.2'
services:
  cassandra:
    image: cassandra:3.11
    restart: unless-stopped
    networks:
      - cross-comms
    volumes:
    - type: volume
      source: mycassandrastore
      target: /var/lib/cassandra
    ports:
      - "${CASSANDRA_PORT}:${CASSANDRA_PORT}"
  statsd:
    image: graphiteapp/graphite-statsd
    restart: unless-stopped
    networks:
      - cross-comms
    ports:
      - "8080:80"
      - "2003:2003"
      - "8125:8125"
      - "8126:8126"
  cadence:
    image: ubercadence/server:master-auto-setup
    restart: unless-stopped
    networks:
      - cross-comms
    ports:
      - "${CADENCE_PORT}:${CADENCE_PORT}"
      - "7934:7934"
      - "7935:7935"
      - "7939:7939"
    environment:
      - "CASSANDRA_SEEDS=cassandra"
      - "STATSD_ENDPOINT=statsd:8125"
      - "DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development.yaml"
      - "CADENCE_CONTEXT_TIMEOUT=600"
    depends_on:
      - cassandra
      - statsd
  cadence-web:
    image: ubercadence/web:latest
    restart: unless-stopped
    networks:
      - cross-comms
    environment:
      - "CADENCE_TCHANNEL_PEERS=cadence:${CADENCE_PORT}"
    ports:
      - "${CADENCE_WEB_PORT}:${CADENCE_WEB_PORT}"
    depends_on:
      - cadence
  cadence-cli-shell:
    image: crux-cadence-cli-shell:latest
    restart: unless-stopped
    networks:
     - cross-comms
    environment:
      - "CADENCE_HOST=cadence"
      - "CADENCE_PORT=${CADENCE_PORT}"
      - "CADENCE_DOMAIN=${CADENCE_DOMAIN}"
    depends_on:
      - cadence
    volumes:
      - cadencedata:/var/lib/cadencedata

volumes:
  mycassandrastore:
  cadencedata:

networks:
  cross-comms:

polyansky-syberry avatar Nov 05 '20 06:11 polyansky-syberry

cadence --domain DOMAIN --address host.docker.internal:7933 workflow reset -w f5e392e2-20ed-4239-9633-65
a352fbd202 -r 5115e281-f48b-4f51-a3de-f1b9880677a3 --event_id 4 --reason "<Some string>"

Works fine

polyansky-syberry avatar Nov 05 '20 13:11 polyansky-syberry

If I set event id = 5 then it returns this error:

19:29:50.332 [main] ERROR Logger - Error
org.apache.thrift.TException: Rpc error:<ErrorResponse id=5 errorType=UnexpectedError message=cadence internal error, msg: nDCStateRebuilder unable to rebuild mutable state to event ID: 4, version: -24>
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:548)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:519)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1597)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$25(WorkflowServiceTChannel.java:1586)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:569)
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1585)
	at com.uber.cadence.samples.common.RegisterDomain.main(RegisterDomain.java:65)

polyansky-syberry avatar Nov 05 '20 16:11 polyansky-syberry

image This is screen with event types and their ids around 4 and 5

polyansky-syberry avatar Nov 06 '20 06:11 polyansky-syberry

@sokada1221 @meiliang86 @mfateev Guys, please, help me with that

polyansky-syberry avatar Nov 12 '20 16:11 polyansky-syberry

@polyansky-syberry sorry for late response. Are you able to address the issue finally? Basically reset is only allowed at DecisionTask boundary(DecisionTaskCompleted/failed/timeout events, in newer server versions, we also support scheduled/started)

longquanzheng avatar Nov 04 '21 17:11 longquanzheng

@longquanzheng Hi! You mentioned that now it's possible to reset workflow execution from DecisionTaskScheduled event. I have timeouted execution with such eventHistory: image

I tried to reset execution from event 2 (used java-client version 3.6.1 and server v0.23.2 and v0.22.4).

    public String resetWorkflow() {
        var request = new ResetWorkflowExecutionRequest();
        var workflowExecution = new WorkflowExecution()
            .setRunId(runId)
            .setWorkflowId(workflowId);
        request.setWorkflowExecution(workflowExecution);
        request.setDecisionFinishEventId(2);
        request.setDomain(domain);

        try {
            return cadenceService.ResetWorkflowExecution(request).getRunId();
        } catch (TException ex) {
            throw new CadenceServiceException("Couldn't reset workflow execution", ex);
        }
    }

But it throws exception while resetting execution:

Caused by: com.uber.cadence.BadRequestError: nDCStateRebuilder unable to rebuild mutable state to event ID: 1, version: -24, baseLastEventID + baseLastEventVersion is not the same as the last event of the last batch, event ID: 2, version :-24 ,typically because of attemptting to rebuild to a middle of a batch
	at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result$ResetWorkflowExecution_resultStandardScheme.read(WorkflowService.java:38530) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result$ResetWorkflowExecution_resultStandardScheme.read(WorkflowService.java:38507) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result.read(WorkflowService.java:38406) ~[cadence-client-3.6.1.jar:na]
	at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:81) ~[libthrift-0.9.3.jar:0.9.3]
	at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:67) ~[libthrift-0.9.3.jar:0.9.3]
	at com.uber.tchannel.messages.ThriftSerializer.decodeBody(ThriftSerializer.java:101) ~[tchannel-core-0.8.30.jar:na]
	at com.uber.tchannel.messages.Serializer.decodeBody(Serializer.java:49) ~[tchannel-core-0.8.30.jar:na]
	at com.uber.tchannel.messages.EncodedResponse.getBody(EncodedResponse.java:85) ~[tchannel-core-0.8.30.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1490) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$27(WorkflowServiceTChannel.java:1477) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCallWithTags(WorkflowServiceTChannel.java:374) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:362) ~[cadence-client-3.6.1.jar:na]

If to try to reset from event 3 programmatically it throws exception:

Caused by: org.apache.thrift.TException: Rpc error:<ErrorResponse id=6 errorType=UnexpectedError message=cadence internal error, msg: CreateWorkflowExecution operation failed. Error: invalid UUID "">
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:345) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:316) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1488) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$27(WorkflowServiceTChannel.java:1477) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCallWithTags(WorkflowServiceTChannel.java:374) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:362) ~[cadence-client-3.6.1.jar:na]
	at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1476) ~[cadence-client-3.6.1.jar:na]

If to reset this execution via cli from event 3, it will be reset successfully.

cadence --domain WORKFLOWS_PRIMARY --address host.docker.internal:7933 workflow reset -w timeout_test_with_childWF.2
022-01-19T11:05:06Z -r 15b64382-28ef-4c03-8bfe-5be59ac4b390 --event_id 3 --reason "<Reset>"
{
  "runId": "2d6caf81-e780-4eed-a117-d167dd5d0c92"
}

But how can be reset such execution programmatically? Can the whole workflow be reset from the beginning?

avitkovskaya-syberry avatar Feb 11 '22 14:02 avitkovskaya-syberry

Yeah I think the new feature is just to allow resetting to the event next to decision scheduled. You can look up the history to find first decision scheduled and add 1 to the event Id .

On Fri, Feb 11, 2022 at 6:55 AM Anastasia Vitkovskaya < @.***> wrote:

@longquanzheng https://github.com/longquanzheng Hi! You mentioned that now it's possible to reset workflow execution from DecisionTaskScheduled event. I have timeouted execution with such eventHistory: [image: image] https://user-images.githubusercontent.com/77055765/153611967-840240f2-6e4c-4813-8b43-039fb74ba37c.png

I tried to reset execution from event 2 (used java-client version 3.6.1 and server v0.23.2 and v0.22.4).

public String resetWorkflow() {
    var request = new ResetWorkflowExecutionRequest();
    var workflowExecution = new WorkflowExecution()
        .setRunId(runId)
        .setWorkflowId(workflowId);
    request.setWorkflowExecution(workflowExecution);
    request.setDecisionFinishEventId(2);
    request.setDomain(domain);

    try {
        return cadenceService.ResetWorkflowExecution(request).getRunId();
    } catch (TException ex) {
        throw new CadenceServiceException("Couldn't reset workflow execution", ex);
    }
}

But it throws exception while resetting execution:

Caused by: com.uber.cadence.BadRequestError: nDCStateRebuilder unable to rebuild mutable state to event ID: 1, version: -24, baseLastEventID + baseLastEventVersion is not the same as the last event of the last batch, event ID: 2, version :-24 ,typically because of attemptting to rebuild to a middle of a batch at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result$ResetWorkflowExecution_resultStandardScheme.read(WorkflowService.java:38530) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result$ResetWorkflowExecution_resultStandardScheme.read(WorkflowService.java:38507) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.WorkflowService$ResetWorkflowExecution_result.read(WorkflowService.java:38406) ~[cadence-client-3.6.1.jar:na] at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:81) ~[libthrift-0.9.3.jar:0.9.3] at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:67) ~[libthrift-0.9.3.jar:0.9.3] at com.uber.tchannel.messages.ThriftSerializer.decodeBody(ThriftSerializer.java:101) ~[tchannel-core-0.8.30.jar:na] at com.uber.tchannel.messages.Serializer.decodeBody(Serializer.java:49) ~[tchannel-core-0.8.30.jar:na] at com.uber.tchannel.messages.EncodedResponse.getBody(EncodedResponse.java:85) ~[tchannel-core-0.8.30.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1490) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$27(WorkflowServiceTChannel.java:1477) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCallWithTags(WorkflowServiceTChannel.java:374) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:362) ~[cadence-client-3.6.1.jar:na]

If to try to reset from event 3 programmatically it throws exception:

Caused by: org.apache.thrift.TException: Rpc error:<ErrorResponse id=6 errorType=UnexpectedError message=cadence internal error, msg: CreateWorkflowExecution operation failed. Error: invalid UUID ""> at com.uber.cadence.serviceclient.WorkflowServiceTChannel.throwOnRpcError(WorkflowServiceTChannel.java:345) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.doRemoteCall(WorkflowServiceTChannel.java:316) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.resetWorkflowExecution(WorkflowServiceTChannel.java:1488) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.lambda$ResetWorkflowExecution$27(WorkflowServiceTChannel.java:1477) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCallWithTags(WorkflowServiceTChannel.java:374) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.measureRemoteCall(WorkflowServiceTChannel.java:362) ~[cadence-client-3.6.1.jar:na] at com.uber.cadence.serviceclient.WorkflowServiceTChannel.ResetWorkflowExecution(WorkflowServiceTChannel.java:1476) ~[cadence-client-3.6.1.jar:na]

If to reset this execution via cli from event 3, it will be reset successfully.

cadence --domain WORKFLOWS_PRIMARY --address host.docker.internal:7933 workflow reset -w timeout_test_with_childWF.2 022-01-19T11:05:06Z -r 15b64382-28ef-4c03-8bfe-5be59ac4b390 --event_id 3 --reason "<Reset>" { "runId": "2d6caf81-e780-4eed-a117-d167dd5d0c92" }

But how can be reset such execution programmatically? Can the whole workflow be reset from the beginning?

— Reply to this email directly, view it on GitHub https://github.com/uber/cadence-java-client/issues/562#issuecomment-1036296256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCQPM3J7FHI7ZR2VN7HQPLU2UPPVANCNFSM4TK546QA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

--

Thanks, Quanzheng

longquanzheng avatar Feb 11 '22 16:02 longquanzheng

Hi, @longquanzheng If I have such events history in workflow run image I reset execution from event 3 and java client returns error Caused by: org.apache.thrift.TException: Rpc error:<ErrorResponse id=7 errorType=UnexpectedError message=cadence internal error, msg: CreateWorkflowExecution operation failed. Error: invalid UUID ""> Is this a server side error? But via cli such workflow is resetted how can such workflow can be resetted using java client?

avitkovskaya-syberry avatar Mar 02 '22 13:03 avitkovskaya-syberry

What if you reset to event 2?

On Wed, Mar 2, 2022 at 5:05 AM Anastasia Vitkovskaya < @.***> wrote:

Hi, @longquanzheng https://github.com/longquanzheng If I have such events history in workflow run [image: image] https://user-images.githubusercontent.com/77055765/156365393-8d05f0af-b05c-4220-9c91-34991fb6b80e.png I reset execution from event 3 and java client returns error Caused by: org.apache.thrift.TException: Rpc error:<ErrorResponse id=7 errorType=UnexpectedError message=cadence internal error, msg: CreateWorkflowExecution operation failed. Error: invalid UUID ""> Is this a server side error? But via cli such workflow is resetted how can such workflow can be resetted using java client?

— Reply to this email directly, view it on GitHub https://github.com/uber/cadence-java-client/issues/562#issuecomment-1056908962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCQPM7A6UNN4C6I25M3M2DU55RTBANCNFSM4TK546QA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

--

Thanks, Quanzheng

longquanzheng avatar Mar 02 '22 15:03 longquanzheng

Hey, @longquanzheng If reset from event 2 from java-client or cli it fails Error: reset failed Error Details: BadRequestError{Message: nDCStateRebuilder unable to rebuild mutable state to event ID: 1, version: -24, baseLastEventID + baseLastEventVersion is not the same as the last event of the last batch, event ID: 2, version :-24 ,typicaly because of attemptting to rebuild to a middle of a batch} ('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)

avitkovskaya-syberry avatar Mar 03 '22 15:03 avitkovskaya-syberry

@longquanzheng, hi! Can you pls provide info how to reset such executions?

avitkovskaya-syberry avatar Mar 10 '22 08:03 avitkovskaya-syberry