community
community copied to clipboard
Sagemaker ACK Fails to update endpoint
Describe the bug
Related to this issue in the CDK: https://github.com/aws/aws-cdk/issues/11594, it appears that updating an existing endpoint with a new Endpoint may require contradictory IAM permissions. Updating the endpointConfigName
field in an existing endpoint
yields this error for me:
- message: "AccessDeniedException: User: arn:aws:sts::<acct omitted>:assumed-role/sagemaker-provisioner/kiam-kiam
is not authorized to perform: sagemaker:UpdateEndpoint on resource: arn:aws:sagemaker:us-east-1:<account omitted>:endpoint-config/endpoint-config-name because no identity-based policy allows the sagemaker:UpdateEndpoint action\n\tstatus
code: 400, request id: <omitted> "
status: "True"
type: ACK.Recoverable
According to this doc, all UpdateEndpoint requires is to specify an endpoint name, which due to internal corporate policies is required. We are not able to add any EndpointConfig
s to the policy due to the same policy.
Steps to reproduce
IAM policy scoped as much as possible:
{
"Sid": "endpoint",
"Effect": "Allow",
"Action": [
"sagemaker:AddTags",
"sagemaker:DeleteTags",
"sagemaker:CreateEndpoint",
"sagemaker:DeleteEndpoint",
"sagemaker:DescribeEndpoint",
"sagemaker:UpdateEndpoint",
"sagemaker:UpdateEndpointWeightsAndCapacities"
],
"Resource": [
"arn:aws:sagemaker:us-east-1:ACCOUNT_NUM:endpoint/test-model",
]
},
{
"Sid": "endpointCfg",
"Effect": "Allow",
"Action": [
"sagemaker:AddTags",
"sagemaker:DeleteTags",
"sagemaker:CreateEndpointConfig",
"sagemaker:CreateEndpoint",
"sagemaker:DescribeEndpointConfig",
"sagemaker:DeleteEndpointConfig"
],
"Resource": [
"arn:aws:sagemaker:us-east-1:ACCOUNT_NUM:endpoint-config/cfg1",
"arn:aws:sagemaker:us-east-1:ACCOUNT_NUM:endpoint-config/cfg2"
]
},
Create the above resources, with the endpoint using cfg1
, then try switching to cfg2
by updating the existing endpoint yaml.
Expected outcome A concise description of what you expected to happen.
Environment
- Kubernetes version:
1.22.10
- Using EKS (yes/no), if so version?
no
- AWS service targeted (S3, RDS, etc.)
sagemaker
/cc @aws-controllers-k8s/sagemaker-maintainer
Hi mwm5945, will attempt to replicate but have a couple questions:
- Which controller verison are you using?
- Is
arn:aws:sts::<acct omitted>:assumed-role/sagemaker-provisioner/kiam-kiam
the ack controller role or the execution role? - Do you create/remove tags in the update?
- Does the error go away if you have sagemaker:UpdateEndpoint in the endpointCfg statement?
-
1.2.2
- Its the KIAM role that the ACK role has a trust relationship with (we're not on AKS, nor do we have the newer auth method setup yet)
- Nope!
- We're not able to do so--our internal corporate policies restrict adding this statement to
endpint-config
s, as it's not listed as an option here. I know doing this would work, as it worked previously, however there was a bug in the platform that handles policy validations, which is ultimately what caused this to be discovered.
Thanks!
Hi Micheal, We are checking with the service team on this issue
Hi Micheal, I can confirm this is a documentation issue and sagemaker:updateEndpoint
permission needs to be on the endpoint config resource as well. We will work with the documentation team to update the docs.
Issues go stale after 180d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 60d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale
/remove-lifecycle stale