sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

KeyError: 'DefaultSpaceSettings' when calling sagemaker.get_execution_role()

Open mlnrt opened this issue 1 year ago • 2 comments

Describe the bug sagemaker.get_execution_role() throws a KeyError: 'DefaultSpaceSettings' error when the space_name in the NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json" file is not null.

To reproduce In a SageMaker Studio Code Editor space run the following:

import boto3
import sagemaker
from sagemaker.session import Session

role = sagemaker.get_execution_role()

Expected behavior sagemaker.get_execution_role() returns the execution role ARN.

Screenshots or logs

KeyError                                  Traceback (most recent call last)
Cell In[1], line 5
      2 import sagemaker
      3 from sagemaker.session import Session
----> 5 role = sagemaker.get_execution_role()
      6 sagemaker_session = sagemaker.Session()

File /opt/conda/lib/python3.10/site-packages/sagemaker/session.py:6271, in get_execution_role(sagemaker_session)
   6269 if not sagemaker_session:
   6270     sagemaker_session = Session()
-> 6271 arn = sagemaker_session.get_caller_identity_arn()
   6273 if ":role/" in arn:
   6274     return arn

File /opt/conda/lib/python3.10/site-packages/sagemaker/session.py:4894, in Session.get_caller_identity_arn(self)
   4892 if space_name is not None:
   4893     domain_desc = self.sagemaker_client.describe_domain(DomainId=domain_id)
-> 4894     return domain_desc["DefaultSpaceSettings"]["ExecutionRole"]
   4896 user_profile_desc = self.sagemaker_client.describe_user_profile(
   4897     DomainId=domain_id, UserProfileName=user_profile_name
   4898 )
   4900 # First, try to find role in userSettings

KeyError: 'DefaultSpaceSettings'

System information A description of your system. Please provide: SageMaker Studio Code Editor

  • SageMaker Studio Distribution: 1.3 (The issue is the same with SageMaker Distribution 1.2)
  • SageMaker Python SDK version: 2.197.0 (the error is the same with latest version 2.207.1)
  • Python version: 3.10.13
  • Custom Docker image (Y/N): N

Additional troubleshooting information I traced back the problem in the session.py file get_caller_identity_arn(self) function. In my case, when reading the NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json" file, the sapce_name is set to the value of the Code Editor space. As a result the get_caller_identity_arn(self) function tries to do the following

# In Space app, find execution role from DefaultSpaceSettings on domain level
if space_name is not None:
    domain_desc = self.sagemaker_client.describe_domain(DomainId=domain_id)
    return domain_desc["DefaultSpaceSettings"]["ExecutionRole"]

The problem is that describe_domain(DomainId=domain_id) does not return a DefaultSpaceSettings key. What I get running the funtion as follow:

sagemaker_session = sagemaker.Session()
sagemaker_session.sagemaker_client.describe_domain(DomainId=domain_id)

is:

{'DomainArn': 'arn:aws:sagemaker:eu-west-1:<my-account-id>:domain/d-<my-sagemaker-studio-domain-id>',
 'DomainId': 'd-<my-sagemaker-studio-domain-id>',
 'DomainName': 'my2-sagemaker-studio-domain',
 'HomeEfsFileSystemId': 'fs-<my-efs-id>',
 'Status': 'InService',
 'CreationTime': datetime.datetime(2023, 11, 12, 12, 30, 21, 436000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2023, 12, 1, 2, 52, 35, 627000, tzinfo=tzlocal()),
 'AuthMode': 'IAM',
 'DefaultUserSettings': {'ExecutionRole': 'arn:aws:iam::<my-account-id>:role/<my-execution-role-name>',
  'SpaceStorageSettings': {'DefaultEbsStorageSettings': {'DefaultEbsVolumeSizeInGb': 5,
    'MaximumEbsVolumeSizeInGb': 100}},
  'DefaultLandingUri': 'studio::',
  'StudioWebPortal': 'ENABLED'},
 'AppNetworkAccessType': 'PublicInternetOnly',
 'SubnetIds': ['subnet-*************'],
 'Url': 'https://d-<my-sagemaker-studio-domain-id>.studio.eu-west-1.sagemaker.aws/',
 'VpcId': 'vpc-***********',
 'ResponseMetadata': {'RequestId': '7e9a5359-eb94-48b2-a522-0e485cd96949',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '7e9a5359-eb94-48b2-a522-0e485cd96949',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '818',
   'date': 'Sun, 11 Feb 2024 10:27:36 GMT'},
  'RetryAttempts': 0}}

So even if sapce_name has a value, the function should return domain_desc["DefaultUserSettings"]["ExecutionRole"] instead of trying to return KeyError: 'DefaultSpaceSettings' which is not available.

mlnrt avatar Feb 11 '24 10:02 mlnrt

Did you find a workaround for this? Having the same issue with the latest release (2.214.3)

gruellan avatar Apr 05 '24 12:04 gruellan

@gruellan sorry for the very late reply. This is my current work around:

# temporary workaround:
import os
import json
NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json"
with open(NOTEBOOK_METADATA_FILE, "rb") as f:
    metadata = json.loads(f.read())
    instance_name = metadata["ResourceName"]
    domain_id = metadata.get("DomainId")
    user_profile_name = metadata.get("UserProfileName")
    space_name = metadata.get("SpaceName")
domain_desc = sagemaker_session.sagemaker_client.describe_domain(DomainId=domain_id)
if "DefaultSpaceSettings" in domain_desc:
    role = domain_desc["DefaultSpaceSettings"]["ExecutionRole"]
else:
    role = domain_desc["DefaultUserSettings"]["ExecutionRole"]
# end of bug workaround

mlnrt avatar May 03 '24 15:05 mlnrt