sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Using AthenaDatasetDefinition in Sagemaker processing job as input results in error with missing "sagemaker_processing" database.

Open leo4ever opened this issue 7 months ago • 1 comments

Describe the bug I am trying to setup a Sagemaker processing job where the job input is defined using the AthenaDatasetDefinition. When executing the job, it fails with message below. It appears the job is trying to create a new database sagemaker_processing. I have tried to specify to reuse an existing database using the dataset definition parameters and also specified the output S3 URI parameter but they don't seem to help.

{"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error creating database 'sagemaker_processing' in catalog 'awsdatacatalog'."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error AccessDeniedException: User: arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker is not authorized to perform: glue:CreateDatabase on resource: arn:aws:glue:us-west-2:726167300549:catalog because no identity-based policy allows the glue:CreateDatabase action"}

To reproduce

  1. Define a sagemaker processing job using AthenaDatasetDefinition as ProcessingInput.
  2. Execute the job

Expected behavior

  1. Job executes without trying to create a new database.

Screenshots or logs {"level":"INFO","ts":"2025-05-13T16:18:55.011Z","msg":"[sagemaker logs] [Input: input-1] Athena dataset definition specified. Starting athena query execution."} {"level":"INFO","ts":"2025-05-13T16:18:55.011Z","msg":"[sagemaker logs] [Input: input-1] Creating database 'sagemaker_processing' in catalog 'awsdatacatalog' if doesn't exist already."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error creating database 'sagemaker_processing' in catalog 'awsdatacatalog'."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error AccessDeniedException: User: arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker is not authorized to perform: glue:CreateDatabase on resource: arn:aws:glue:us-west-2:726167300549:catalog because no identity-based policy allows the glue:CreateDatabase action"}

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.227.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): ScriptProcessor
  • Framework version:
  • Python version: 3.11.11
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context

leo4ever avatar May 13 '25 16:05 leo4ever

Seem your role "arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker" does not have glue:CreateDatabase permissions

  1. Try granting glue:CreateDatabase or Glue:* (temporary) permissions to your role
  2. Test it with some Admin role.

shah-rukk avatar May 17 '25 04:05 shah-rukk