serverless-step-functions icon indicating copy to clipboard operation
serverless-step-functions copied to clipboard

State Machine Already Exists error when redeploying

Open jgcoding opened this issue 4 years ago • 4 comments
trafficstars

This is a Bug Report

Re-deployment fails with "State Machine Already Exists" error. I have deployed and redeployed several state machines many times. This is the first time seeing this error.

For bug reports:

  • What went wrong? - I attempt to deploy an existing application containing an existing State Machine definition and the deployment fails with the error "State Machine Already Exists"
  • What did you expect should have happened? - I expected the deployment to succeed or at least just update the State Machine
  • What was the config you used? serverless.yml
  • What stacktrace or error message from your provider did you see? - State Machine Already Exists: 'arn:aws:states:us-east-1:xxxx:stateMachine:expressStateMachine-dev' (Service: AWSStepFunctions; Status Code: 400; Error Code: StateMachineAlreadyExists; Request ID: xxx-xxxx-xxxx-xxxx-xxxxxxx; Proxy: null)

Additional Data

  • Serverless Framework Core Version you're using: - serverless-step-functions
  • The Plugin Version you're using: - 2.27.1
  • Operating System: MacOS Big Sur 11.0.1
  • Stack Trace:
  • Provider Error messages: State Machine Already Exists: 'arn:aws:states:us-east-1:xxxx:stateMachine:expressStateMachine-dev' (Service: AWSStepFunctions; Status Code: 400; Error Code: StateMachineAlreadyExists; Request ID: xxx-xxxx-xxxx-xxxx-xxxxxxx; Proxy: null)

jgcoding avatar Jan 07 '21 14:01 jgcoding

Actually, that did not work. It was a head-fake. Just had to update the application again and I am getting this error again.

jgcoding avatar Jan 07 '21 14:01 jgcoding

@jgcoding what does your serverless.yml look like?

and, am I understanding you correctly that you defined the state machine in the serverless.yml, deployed it, then when you try to deploy the project again it throws up this error?

theburningmonk avatar Jan 16 '21 09:01 theburningmonk

@theburningmonk - that is correct. Deployed initially and then numerous times successfully. I manage 2 other state machines with your plugin with no issues.

Here is the SM template:


stepFunctions:
  stateMachines:
    expressSample:
      type: EXPRESS
      name: expressSample-${self:provider.stage}
      role: arn:aws:iam::${self:custom.account}:role/step_functions_role
      definition:
        Comment: Batch Processing Service
        StartAt: SetType
        States:
          SetType:
            Type: Task
            Resource: arn:aws:lambda:${self:provider.region}:${self:custom.account}:function:${self:service}-${self:provider.stage}-setType
            TimeoutSeconds: 60
            ResultPath: $.results
            OutputPath: $.results
            Next: CheckType
            Retry:
              - ErrorEquals:
                  - States.TaskFailed
                  - States.Timeout
                IntervalSeconds: 5
                MaxAttempts: 2
                BackoffRate: 2
              - ErrorEquals:
                  - States.ALL
                IntervalSeconds: 2
                MaxAttempts: 2
                BackoffRate: 2
            Catch:
              - ErrorEquals:
                  - States.ALL
                Next: BatchError
          CheckType:
            Type: Choice
            Choices:
              - Variable: $.sample_input.type
                StringEquals: name-check
                Next: NameChecker
              - Variable: $.sample_input.type
                StringEquals: address-check
                Next: AddressChecker
            Default: BatchError
          NameChecker:
            Type: Task
            Resource: arn:aws:lambda:${self:provider.region}:${self:custom.account}:function:${self:service}-${self:provider.stage}-nameChecker
            TimeoutSeconds: 60
            ResultPath: $.results
            OutputPath: $.results
            Next: Finish
            Retry:
              - ErrorEquals:
                  - States.TaskFailed
                  - States.Timeout
                IntervalSeconds: 5
                MaxAttempts: 2
                BackoffRate: 2
              - ErrorEquals:
                  - States.ALL
                IntervalSeconds: 2
                MaxAttempts: 2
                BackoffRate: 2
            Catch:
              - ErrorEquals:
                  - States.ALL
                Next: BatchError
          AddressChecker:
            Type: Task
            Resource: arn:aws:lambda:${self:provider.region}:${self:custom.account}:function:${self:service}-${self:provider.stage}-addressChecker
            TimeoutSeconds: 60
            ResultPath: $.results
            OutputPath: $.results
            Next: Finish
            Retry:
              - ErrorEquals:
                  - States.TaskFailed
                  - States.Timeout
                IntervalSeconds: 5
                MaxAttempts: 2
                BackoffRate: 2
              - ErrorEquals:
                  - States.ALL
                IntervalSeconds: 2
                MaxAttempts: 2
                BackoffRate: 2
            Catch:
              - ErrorEquals:
                  - States.ALL
                Next: BatchError
          Finish:
            Type: Succeed
          BatchError:
            Type: Fail
            Error: GenericError
            Cause: An error occurred while executing the state machine

jgcoding avatar Jan 21 '21 01:01 jgcoding

I am seeing the same error. Below is the stepfunction config from serverless.yml. The change that prompted this was adding the id field. Prior changes that went through were things like changes to the Wait time.

stepFunctions:
  stateMachines:
    RoomDeletionStateMachine:
      id: RoomDeletionStateMachine
      name: ${self:service}-${self:provider.stage}-RoomDeletion
      definition:
        Comment: 'Room timeout before deletion'
        StartAt: Wait
        States:
          Wait:
            Type: Wait
            Seconds: 60
            Next: RoomDeletionExecute
          RoomDeletionExecute:
            Type: Task
            Resource:
              Fn::GetAtt: [RoomDeletionExecute, Arn]
            End: true

EDIT: I resolved this by changing the name field. The issue was that using id changed the resource key in the CFN template so instead of updating the existing on CFN wanted to make a new one instead of updating the existing one, but since the actual state machine name was the same it failed. I fixed it by changing the name. In theory one could change it again back to the original in a second deploy. Leaving this here in case it is relevant to someone else.

Also, just as an aside, it was surprising to me that the logical id was not just the key used in the definition. You can see the duplication in my example where both the key for the state machine and the id are the same. I think that would be a lot more intuitive to pull it from the key instead of having a separate field, since that's how it works for most things in CFN templates.

robpc avatar Feb 15 '21 04:02 robpc