[Feature] Add support for AWS SageMaker
Summary
Add support for ingesting AWS SageMaker resources into Cartography. Amazon SageMaker is AWS's fully managed machine learning platform for building, training, and deploying ML models. This feature would allow Cartography to track SageMaker notebooks, training jobs, models, endpoints, and associated infrastructure.
Motivation
AWS SageMaker is a core component of enterprise ML infrastructure and presents significant security considerations around data access, model governance, and compute resources. By ingesting SageMaker resources, Cartography can surface:
- Notebook instances and their IAM roles/network configurations
- Training jobs and their access to S3 training data
- Deployed model endpoints and their exposure
- SageMaker domains and user profiles
- Model registry entries and approval workflows
- Feature store configurations
This unlocks graph-based security analysis such as:
- Identifying notebook instances with overly permissive IAM roles
- Tracking data lineage from S3 buckets through training to deployed models
- Auditing which principals can invoke model endpoints
- Detecting notebooks in public subnets or with direct internet access
- Mapping ML pipelines and their dependencies
Proposed Solution
Extend the AWS intel module to call the SageMaker APIs and model the following resources:
New Nodes:
AWSSageMakerNotebookInstance- Jupyter notebook instancesAWSSageMakerDomain- SageMaker Studio domainsAWSSageMakerUserProfile- Studio user profilesAWSSageMakerModel- Deployed modelsAWSSageMakerModelPackage- Model Registry EntriesAWSSageMakerEndpoint- Model inference endpointsAWSSageMakerEndpointConfig- Endpoint configurationsAWSSageMakerTrainingJob- Training jobsAWSSageMakerTrainingJob- Transform/Batch jobsAWSSageMakerModelPackageGroup- Groups related Model Packages
New Relationships: -( :AWSSageMakerTrainingJob)-[:READ_FROM]->(:S3Bucket)
- (:AWSSageMakerTrainingJob)-[:PRODUCED_MODEL_ARTIFACT]->(:S3Bucket)
- (:AWSSageMakerModelPackage)-[:REFERENCED_ARTIFACTS_IN]->(:S3Bucket)
- (:AWSSageMakerModel)-[:REFERENCED_ARTIFACTS_IN]->(:S3Bucket)
- (:AWSSageMakerModel)-[:DERIVED_FROM]->(:AWSSageMakerModelPackage)
- (:AWSSageMakerModelPackage)-[:MEMBER_OF]->(:AWSSageMakerModelPackageGroup)
- (:AWSSageMakerTransformJob)-[:USED]->(:AWSSageMakerModel)
- (:AWSSageMakerTransformJob)-[:WROTE_TO]->(:S3Bucket)
- (:AWSSageMakerEndpointConfig)-[:REFERENCED]->(:AWSSageMakerModel)
- (:AWSSageMakerEndpoint)-[:USED]->(:AWSSageMakerEndpointConfig)
- (:AWSSageMakerTrainingJob)-[:CALLED_BY]->(:AWSSageMakerNotebookInstance)
- (:AWSSageMakerDomain)-[:CONTAINS]->(:AWSSageMakerUserProfile)
AWS APIs to integrate:
sagemaker:ListNotebookInstances/DescribeNotebookInstancesagemaker:ListDomains/DescribeDomainsagemaker:ListUserProfilessagemaker:ListModels/DescribeModelsagemaker:ListEndpoints/DescribeEndpointsagemaker:ListTrainingJobssagemaker:ListModelPackages
Alternatives Considered
- Focusing only on notebook instances - this misses the broader ML pipeline and deployment surface
- Using CloudTrail for SageMaker activity - provides audit trail but not resource configuration