media-insights-on-aws icon indicating copy to clipboard operation
media-insights-on-aws copied to clipboard

Reduce privileges for InvokeFunction in the StepFunctionRole

Open jerryelk opened this issue 4 years ago • 6 comments

Develop an alternative to tag based auth for MIE.

jerryelk avatar Dec 04 '20 19:12 jerryelk

To summarize this issue, we need to change StepFunctionRole in media-insights-stack.yaml so it does not use Resource: * to scope lambda:InvokeFunction. i.e. This is bad:

  Statement:
    - Action: "lambda:InvokeFunction"
      Resource: "*"
      Effect: "Allow"

Ideally, we could put a tag-based condition, like this:

  Statement:
    - Action: "lambda:InvokeFunction"
      Resource: "*"
      Effect: "Allow"
      Condition:
        StringEquals:
          "aws:ResourceTag/environment": "mie"

but IAM does not support tag based resource policies for Lambda, yet.

So, here are out options:

Option 1: explicitly specify Lambda ARNs in order to scope the InvokeFunction, like this:

  Statement:
    - Action: "lambda:InvokeFunction"
      Resource:
        - !GetAtt startGenericDataLookup.Arn
        - !GetAtt startKeyPhrases.Arn
        - !GetAtt getKeyPhrases.Arn
        - !GetAtt startEntityDetection.Arn
        - !GetAtt getEntityDetection.Arn
        ...and so on for each MIE and user-defined operator

This option requires an additional update-stack step after end-users deploy their stack, so they can append all their lambda arns for all their new operators to MIE's StepFunctionRole. The MIE implementation guide will need to advise users that they need to run an update-stack command.

Option 2: use a prefix in lambda names so we can scope InvokeFunction like this:

  Statement:
    - Action: "lambda:InvokeFunction"
      Resource:
        - !GetAtt startGenericDataLookup.Arn
        - !GetAtt startKeyPhrases.Arn
        - !GetAtt getKeyPhrases.Arn
        - !GetAtt startEntityDetection.Arn
        - !GetAtt getEntityDetection.Arn
        ... and so on, for each operator in the operator library
        - "mie-*"

The MIE implementation guide will need to advise users that they need to prefix their operator lambda names with "mie-".

Option 1 is more secure. Option 2 is more user friendly.

ianwow avatar Dec 07 '20 20:12 ianwow

Option 3:

Operators can be created via a POST to WORKFLOW_API_ENDPOINT/workflow/operation. Those calls include parameters for StartLambdaArn and MonitorLambdaArn, and the StepFunctionRole is provided in os.environ, so we have everything we need to dynamically add execution permission for those Lambdas to the StepFunctionRole. We could use boto3 in workflowapi/app.py:create_operation() to create a new IAM policy like this:

# define a new policy permitting execution for the new operator's lambda functions
  policy = {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Action": "lambda:InvokeFunction",
              "Resource": [
                  operation["StartLambdaArn"], 
                  operation["MonitorLambdaArn"]
              ]
          }
      ]
  }
# attach that policy to the StepFunctionRole
  response = iam.create_policy(
   PolicyName=operation["Name"],
   PolicyDocument=json.dumps(policy)
  )
  policy_arn = response['Policy'][0]['Arn']
  iam.attach_role_policy(PolicyArn=policy_arn,RoleName=STAGE_EXECUTION_ROLE)

If the operator is removed via DELETE /workflow/operation/{Name}, then we'll need to call delete_role_policy from workflowapi/app.py:delete_operation()

This option would probably provide a better security model than relying on resource tag or name patterns.

ianwow avatar Dec 15 '20 19:12 ianwow

The only problem with Option 3 is an IAM quota that limits us from adding more than 10 policies to a role. We have no limit on the number of operators people may create. So, in order to implement Option 3, we'll need to reuse a policy. To support concurrent operator create requests, we'll need a locking mechanism for said policy. That makes Option 3 much more challenging.

ianwow avatar Dec 16 '20 19:12 ianwow

Option 3 is not going to work because quotas. Option 2 is ineffective because it contains a wildcard, and therefore no better than what StepFunctionRole already uses. Option 1 is insufficient because update-stack is not transactional, so two different users running update-stack at the same time could interfere with each other. The expectation that MIE users would modify the MIE stack also introduces undesirable coupling between the MIE framework and MIE users/apps.

Until IAM supports Lambda tags, I think the best option is to stick with the status quo, using wildcards in the StepFunctionRole. I am therefore closing this issue.

ianwow avatar Dec 29 '20 18:12 ianwow

We should use inline policy, not managed policy for option #3. Thanks Brandon for bringing that to my attention. The inline policy approach will work!

ianwow avatar Jan 04 '21 23:01 ianwow

Turns out, inline policies won't work either because we will too easily exceed the maximum length for inline policies.

The aggregate inline policy size must not exceed 10,240 bytes. If we create an inline policy for every operator (as I have done in the workaround_tag_based_auth branch), then the StepFunctionRole's inline policy is 19,354 bytes out-of-the-box, leaving no room for users to add more operators.

You can measure the inline policy size like this: aws iam list-role-policies --role-name mie03b-StepFunctionRole-1YHSL5DOT8GF | jq '.PolicyNames | .[]' | tr -d '"' | while read line; do aws iam get-role-policy --role-name mie03b-StepFunctionRole-1YHSL5DOT8GF --policy-name $line | wc -c

To reproduce this error, deploy the workaround_tag_based_auth branch, then run the workflowapi integration test. It will fail with this error:

Maximum policy size of 10240 bytes exceeded for role mie03b-StepFunctionRole-1YHSL5DOT8GF

ianwow avatar Jan 05 '21 06:01 ianwow