serverless-step-functions
serverless-step-functions copied to clipboard
State Machine Already Exists error when redeploying
This is a Bug Report
Re-deployment fails with "State Machine Already Exists" error. I have deployed and redeployed several state machines many times. This is the first time seeing this error.
For bug reports:
- What went wrong? - I attempt to deploy an existing application containing an existing State Machine definition and the deployment fails with the error "State Machine Already Exists"
- What did you expect should have happened? - I expected the deployment to succeed or at least just update the State Machine
- What was the config you used? serverless.yml
- What stacktrace or error message from your provider did you see? - State Machine Already Exists: 'arn:aws:states:us-east-1:xxxx:stateMachine:expressStateMachine-dev' (Service: AWSStepFunctions; Status Code: 400; Error Code: StateMachineAlreadyExists; Request ID: xxx-xxxx-xxxx-xxxx-xxxxxxx; Proxy: null)
Additional Data
- Serverless Framework Core Version you're using: - serverless-step-functions
- The Plugin Version you're using: - 2.27.1
- Operating System: MacOS Big Sur 11.0.1
- Stack Trace:
- Provider Error messages: State Machine Already Exists: 'arn:aws:states:us-east-1:xxxx:stateMachine:expressStateMachine-dev' (Service: AWSStepFunctions; Status Code: 400; Error Code: StateMachineAlreadyExists; Request ID: xxx-xxxx-xxxx-xxxx-xxxxxxx; Proxy: null)
Actually, that did not work. It was a head-fake. Just had to update the application again and I am getting this error again.
@jgcoding what does your serverless.yml look like?
and, am I understanding you correctly that you defined the state machine in the serverless.yml, deployed it, then when you try to deploy the project again it throws up this error?
@theburningmonk - that is correct. Deployed initially and then numerous times successfully. I manage 2 other state machines with your plugin with no issues.
Here is the SM template:
stepFunctions:
stateMachines:
expressSample:
type: EXPRESS
name: expressSample-${self:provider.stage}
role: arn:aws:iam::${self:custom.account}:role/step_functions_role
definition:
Comment: Batch Processing Service
StartAt: SetType
States:
SetType:
Type: Task
Resource: arn:aws:lambda:${self:provider.region}:${self:custom.account}:function:${self:service}-${self:provider.stage}-setType
TimeoutSeconds: 60
ResultPath: $.results
OutputPath: $.results
Next: CheckType
Retry:
- ErrorEquals:
- States.TaskFailed
- States.Timeout
IntervalSeconds: 5
MaxAttempts: 2
BackoffRate: 2
- ErrorEquals:
- States.ALL
IntervalSeconds: 2
MaxAttempts: 2
BackoffRate: 2
Catch:
- ErrorEquals:
- States.ALL
Next: BatchError
CheckType:
Type: Choice
Choices:
- Variable: $.sample_input.type
StringEquals: name-check
Next: NameChecker
- Variable: $.sample_input.type
StringEquals: address-check
Next: AddressChecker
Default: BatchError
NameChecker:
Type: Task
Resource: arn:aws:lambda:${self:provider.region}:${self:custom.account}:function:${self:service}-${self:provider.stage}-nameChecker
TimeoutSeconds: 60
ResultPath: $.results
OutputPath: $.results
Next: Finish
Retry:
- ErrorEquals:
- States.TaskFailed
- States.Timeout
IntervalSeconds: 5
MaxAttempts: 2
BackoffRate: 2
- ErrorEquals:
- States.ALL
IntervalSeconds: 2
MaxAttempts: 2
BackoffRate: 2
Catch:
- ErrorEquals:
- States.ALL
Next: BatchError
AddressChecker:
Type: Task
Resource: arn:aws:lambda:${self:provider.region}:${self:custom.account}:function:${self:service}-${self:provider.stage}-addressChecker
TimeoutSeconds: 60
ResultPath: $.results
OutputPath: $.results
Next: Finish
Retry:
- ErrorEquals:
- States.TaskFailed
- States.Timeout
IntervalSeconds: 5
MaxAttempts: 2
BackoffRate: 2
- ErrorEquals:
- States.ALL
IntervalSeconds: 2
MaxAttempts: 2
BackoffRate: 2
Catch:
- ErrorEquals:
- States.ALL
Next: BatchError
Finish:
Type: Succeed
BatchError:
Type: Fail
Error: GenericError
Cause: An error occurred while executing the state machine
I am seeing the same error. Below is the stepfunction config from serverless.yml. The change that prompted this was adding the id field. Prior changes that went through were things like changes to the Wait time.
stepFunctions:
stateMachines:
RoomDeletionStateMachine:
id: RoomDeletionStateMachine
name: ${self:service}-${self:provider.stage}-RoomDeletion
definition:
Comment: 'Room timeout before deletion'
StartAt: Wait
States:
Wait:
Type: Wait
Seconds: 60
Next: RoomDeletionExecute
RoomDeletionExecute:
Type: Task
Resource:
Fn::GetAtt: [RoomDeletionExecute, Arn]
End: true
EDIT: I resolved this by changing the name field. The issue was that using id changed the resource key in the CFN template so instead of updating the existing on CFN wanted to make a new one instead of updating the existing one, but since the actual state machine name was the same it failed. I fixed it by changing the name. In theory one could change it again back to the original in a second deploy. Leaving this here in case it is relevant to someone else.
Also, just as an aside, it was surprising to me that the logical id was not just the key used in the definition. You can see the duplication in my example where both the key for the state machine and the id are the same. I think that would be a lot more intuitive to pull it from the key instead of having a separate field, since that's how it works for most things in CFN templates.