serverless-step-functions icon indicating copy to clipboard operation
serverless-step-functions copied to clipboard

Changing a step function definition changes the IAM policies and make running executions fail.

Open benja-M-1 opened this issue 4 years ago • 2 comments

This is a question

Some Context

On my project we had to change the workflow in a pretty huge step function. At first it looked something like this:

  • Task A
  • Task B
  • Parrallel tasks
    • Task C
      • Task D
      • Task E
    • Task F
      • Task G
  • Task H
  • Task I

Then we moved to :

  • Task A
  • Task B
  • Task C' -> calls a new Step Function containing Task C, Task D and Task E
  • Task F' -> calls a new Step Function containing Task F and Task G
  • Task H
  • Task I

What went wrong?

In the first step function definition, the plugin generates the IAM policies that allows the step function to invoke a lambda, push a message in SQS, and so on.

The problem is that when transforming the step function to the second version, the resources moved into another step function. Thus the plugins generated a new set of policies for the other step functions and did not generate the policies for the invocations that moved awway.

Then when we deployed on production the current running step functions, based on the first definition started to failed one by one with the following type of error:

User: arn:aws:sts::364593438022:assumed-role/service-MyStepFunctionRole/JikZyqUWAaDsnoaqSNVNFtLIImCpcPga is not authorized to perform: lambda:InvokeFunction on resource: arn:aws:lambda:eu-west-3:123456789123:function:service-myFunction

What to do?

We followed the @theburningmonk's guide on Blue/Green deployment wich is great and allowed us to upgrade our functions easily without breaking things.

I would love that the same thing exists for policies.

The fix in our situation has been to stop the step functions and restart them all. Fortunately there were only a few hundreds of executions concerned by the problem. Unfortunately the step function sends emails to end users thus they will receive an email they already received few days ago.

But I started trying to find a solution. I was thinking about creating a role for the step function and add all the needed policies like lambda::InvokeFunction for the 37 functions and all of their versions. The downside would have been to manually add every resource we add to the step function manually to the policy.

I was wondering how you would manage this situation if you were me?

Additional Data

  • Serverless Framework Core Version you're using: 1.51.0
  • The Plugin Version you're using: ^2.17.4

benja-M-1 avatar May 05 '20 12:05 benja-M-1

If I were in your position, I would do exactly as you suggested and manually define the role. Right now, the way this plugin generates for least privilege will always cause the problem you experienced. Unfortunately, it's a symptom of something done well. Maybe in the future there could be flag for steps that call step function resources to inherit the policy items from the called state machines, but that would potentially require a subsequent deployment if you wanted to turn it off.

I'm interested to hear if anyone has another idea.

JeremyDOwens avatar May 05 '20 23:05 JeremyDOwens

@benja-M-1 oh, that's a good point, in this case, you're changing the state machine itself such that it needs different policies.

I think in this case, you should define your own IAM role, that seems to be the most sensible solution, to save you some typing, you can take advantage of the naming convention SLS gives you and give Invoke permission to everything that starts with the right prefix, e.g. ${self:service}-${self:provider.stage}-* and all versions ${self:service}-${self:provider.stage}-*/*

Unfortunately, there's no way to version the IAM role AFAIK, each execution still uses the same IAM role that is configured on the state machine and since we don't know what functions you had before so we can't simultaneously apply least privilege while ensuring all running executions are not impacted.

theburningmonk avatar Jun 02 '20 20:06 theburningmonk