media-insights-on-aws
media-insights-on-aws copied to clipboard
Reduce privileges for InvokeFunction in the StepFunctionRole
Develop an alternative to tag based auth for MIE.
To summarize this issue, we need to change StepFunctionRole in media-insights-stack.yaml so it does not use Resource: * to scope lambda:InvokeFunction. i.e. This is bad:
Statement:
- Action: "lambda:InvokeFunction"
Resource: "*"
Effect: "Allow"
Ideally, we could put a tag-based condition, like this:
Statement:
- Action: "lambda:InvokeFunction"
Resource: "*"
Effect: "Allow"
Condition:
StringEquals:
"aws:ResourceTag/environment": "mie"
but IAM does not support tag based resource policies for Lambda, yet.
So, here are out options:
Option 1: explicitly specify Lambda ARNs in order to scope the InvokeFunction, like this:
Statement:
- Action: "lambda:InvokeFunction"
Resource:
- !GetAtt startGenericDataLookup.Arn
- !GetAtt startKeyPhrases.Arn
- !GetAtt getKeyPhrases.Arn
- !GetAtt startEntityDetection.Arn
- !GetAtt getEntityDetection.Arn
...and so on for each MIE and user-defined operator
This option requires an additional update-stack step after end-users deploy their stack, so they can append all their lambda arns for all their new operators to MIE's StepFunctionRole. The MIE implementation guide will need to advise users that they need to run an update-stack
command.
Option 2: use a prefix in lambda names so we can scope InvokeFunction like this:
Statement:
- Action: "lambda:InvokeFunction"
Resource:
- !GetAtt startGenericDataLookup.Arn
- !GetAtt startKeyPhrases.Arn
- !GetAtt getKeyPhrases.Arn
- !GetAtt startEntityDetection.Arn
- !GetAtt getEntityDetection.Arn
... and so on, for each operator in the operator library
- "mie-*"
The MIE implementation guide will need to advise users that they need to prefix their operator lambda names with "mie-".
Option 1 is more secure. Option 2 is more user friendly.
Option 3:
Operators can be created via a POST to WORKFLOW_API_ENDPOINT/workflow/operation. Those calls include parameters for StartLambdaArn and MonitorLambdaArn, and the StepFunctionRole is provided in os.environ, so we have everything we need to dynamically add execution permission for those Lambdas to the StepFunctionRole. We could use boto3 in workflowapi/app.py:create_operation()
to create a new IAM policy like this:
# define a new policy permitting execution for the new operator's lambda functions
policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": [
operation["StartLambdaArn"],
operation["MonitorLambdaArn"]
]
}
]
}
# attach that policy to the StepFunctionRole
response = iam.create_policy(
PolicyName=operation["Name"],
PolicyDocument=json.dumps(policy)
)
policy_arn = response['Policy'][0]['Arn']
iam.attach_role_policy(PolicyArn=policy_arn,RoleName=STAGE_EXECUTION_ROLE)
If the operator is removed via DELETE /workflow/operation/{Name}, then we'll need to call delete_role_policy from workflowapi/app.py:delete_operation()
This option would probably provide a better security model than relying on resource tag or name patterns.
The only problem with Option 3 is an IAM quota that limits us from adding more than 10 policies to a role. We have no limit on the number of operators people may create. So, in order to implement Option 3, we'll need to reuse a policy. To support concurrent operator create requests, we'll need a locking mechanism for said policy. That makes Option 3 much more challenging.
Option 3 is not going to work because quotas. Option 2 is ineffective because it contains a wildcard, and therefore no better than what StepFunctionRole already uses. Option 1 is insufficient because update-stack is not transactional, so two different users running update-stack at the same time could interfere with each other. The expectation that MIE users would modify the MIE stack also introduces undesirable coupling between the MIE framework and MIE users/apps.
Until IAM supports Lambda tags, I think the best option is to stick with the status quo, using wildcards in the StepFunctionRole. I am therefore closing this issue.
We should use inline policy, not managed policy for option #3. Thanks Brandon for bringing that to my attention. The inline policy approach will work!
Turns out, inline policies won't work either because we will too easily exceed the maximum length for inline policies.
The aggregate inline policy size must not exceed 10,240 bytes. If we create an inline policy for every operator (as I have done in the workaround_tag_based_auth branch), then the StepFunctionRole's inline policy is 19,354 bytes out-of-the-box, leaving no room for users to add more operators.
You can measure the inline policy size like this:
aws iam list-role-policies --role-name mie03b-StepFunctionRole-1YHSL5DOT8GF | jq '.PolicyNames | .[]' | tr -d '"' | while read line; do aws iam get-role-policy --role-name mie03b-StepFunctionRole-1YHSL5DOT8GF --policy-name $line | wc -c
To reproduce this error, deploy the workaround_tag_based_auth
branch, then run the workflowapi integration test. It will fail with this error:
Maximum policy size of 10240 bytes exceeded for role mie03b-StepFunctionRole-1YHSL5DOT8GF