temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Workflow Reset: add logic of determining the reset point to the service

Open mfateev opened this issue 4 years ago • 3 comments

Is your feature request related to a problem? Please describe. tctl workflow reset supports reset_type argument. The ResetWorkflowExecution gRPC API accepts only workflow_task_finish_event_id. So all the logic of finding reset point resides in the tctl. This makes the logic not reusable when SDKs invoke reset operation directly.

Describe the solution you'd like Move logic of finding reset point to the service by adding reset_type argument to ResetWorkflowExecutionRequest.

mfateev avatar Mar 01 '21 16:03 mfateev

Hey, @mfateev would you mind, if I work on this issue? I have been using temporal.io for quite a long time and want to contribute something : ) Thanks.

bisakhmondal avatar Jul 29 '21 07:07 bisakhmondal

Hello,

In case there are parallel branches in a workflow it can be difficult to find the "reset point" event corresponding to the branch one wishes to restart. In the Java SDK, we haven't found any correlation between the "workflow task started" event and activity events that follow.

If the logic of determining the reset point were moved to the service, ideally one could just invoke an API to e.g. "reset a workflow to the point preceding the first failure".

Thanks

sstro avatar Mar 24 '23 13:03 sstro

Auto‐select reset point when not provided

Problem

By default the Reset API requires clients to supply a WorkflowTaskFinishEventId, forcing callers to inspect history and pick an internal event ID themselves. This makes resets brittle and user‐unfriendly.

Solution

If the client omits WorkflowTaskFinishEventId (i.e. it’s zero), the service will:

  1. Read the workflow’s history branch up to (NextEventID – 1).
  2. Scan the returned HistoryEvents for the last WORKFLOW_TASK_COMPLETED event.
  3. Set WorkflowTaskFinishEventId to that event’s ID, so the reset will roll back to just before it completed.
  4. Proceed with the existing validation and reset logic.

Key Code Snippet

// in service/history/api/resetworkflow/api.go → Invoke(...)
baseMutableState := baseLease.GetMutableState()

// 1) Auto‐select finish ID if caller omitted it
if req.GetWorkflowTaskFinishEventId() == 0 {
  // read history up to last event
  resp, err := shardCtx.GetExecutionManager().ReadHistoryBranch(ctx, &persistence.ReadHistoryBranchRequest{
    BranchToken: baseMutableState.GetExecutionInfo().GetCurrentBranchToken(),
    MinEventID:  common.FirstEventID,
    MaxEventID:  baseMutableState.GetNextEventID() - 1,
    PageSize:    defaultPageSize,
  })
  if err != nil {
    return nil, serviceerror.NewInternal("fetching history for reset: " + err.Error())
  }

  // 2) find last WorkflowTaskCompleted
  var lastComplete int64
  for _, ev := range resp.HistoryEvents {
    if ev.GetEventType() == enumspb.EVENT_TYPE_WORKFLOW_TASK_COMPLETED {
      lastComplete = ev.GetEventId()
    }
  }
  if lastComplete <= common.FirstEventID {
    return nil, serviceerror.NewInvalidArgument("no completed workflow task found to reset to")
  }

  // 3) use that as the finish‐event ID
  req.WorkflowTaskFinishEventId = lastComplete
}

// 4) existing validation now passes, then core reset logic runs...

Shridhar2104 avatar May 23 '25 07:05 Shridhar2104