cartography icon indicating copy to clipboard operation
cartography copied to clipboard

[Feature] Add support for AWS SageMaker

Open kunaals opened this issue 2 weeks ago • 0 comments

Summary

Add support for ingesting AWS SageMaker resources into Cartography. Amazon SageMaker is AWS's fully managed machine learning platform for building, training, and deploying ML models. This feature would allow Cartography to track SageMaker notebooks, training jobs, models, endpoints, and associated infrastructure.

Motivation

AWS SageMaker is a core component of enterprise ML infrastructure and presents significant security considerations around data access, model governance, and compute resources. By ingesting SageMaker resources, Cartography can surface:

  • Notebook instances and their IAM roles/network configurations
  • Training jobs and their access to S3 training data
  • Deployed model endpoints and their exposure
  • SageMaker domains and user profiles
  • Model registry entries and approval workflows
  • Feature store configurations

This unlocks graph-based security analysis such as:

  • Identifying notebook instances with overly permissive IAM roles
  • Tracking data lineage from S3 buckets through training to deployed models
  • Auditing which principals can invoke model endpoints
  • Detecting notebooks in public subnets or with direct internet access
  • Mapping ML pipelines and their dependencies

Proposed Solution

Extend the AWS intel module to call the SageMaker APIs and model the following resources:

New Nodes:

  • AWSSageMakerNotebookInstance - Jupyter notebook instances
  • AWSSageMakerDomain - SageMaker Studio domains
  • AWSSageMakerUserProfile - Studio user profiles
  • AWSSageMakerModel - Deployed models
  • AWSSageMakerModelPackage - Model Registry Entries
  • AWSSageMakerEndpoint - Model inference endpoints
  • AWSSageMakerEndpointConfig - Endpoint configurations
  • AWSSageMakerTrainingJob - Training jobs
  • AWSSageMakerTrainingJob - Transform/Batch jobs
  • AWSSageMakerModelPackageGroup - Groups related Model Packages

New Relationships: -( :AWSSageMakerTrainingJob)-[:READ_FROM]->(:S3Bucket)

  • (:AWSSageMakerTrainingJob)-[:PRODUCED_MODEL_ARTIFACT]->(:S3Bucket)
  • (:AWSSageMakerModelPackage)-[:REFERENCED_ARTIFACTS_IN]->(:S3Bucket)
  • (:AWSSageMakerModel)-[:REFERENCED_ARTIFACTS_IN]->(:S3Bucket)
  • (:AWSSageMakerModel)-[:DERIVED_FROM]->(:AWSSageMakerModelPackage)
  • (:AWSSageMakerModelPackage)-[:MEMBER_OF]->(:AWSSageMakerModelPackageGroup)
  • (:AWSSageMakerTransformJob)-[:USED]->(:AWSSageMakerModel)
  • (:AWSSageMakerTransformJob)-[:WROTE_TO]->(:S3Bucket)
  • (:AWSSageMakerEndpointConfig)-[:REFERENCED]->(:AWSSageMakerModel)
  • (:AWSSageMakerEndpoint)-[:USED]->(:AWSSageMakerEndpointConfig)
  • (:AWSSageMakerTrainingJob)-[:CALLED_BY]->(:AWSSageMakerNotebookInstance)
  • (:AWSSageMakerDomain)-[:CONTAINS]->(:AWSSageMakerUserProfile)

AWS APIs to integrate:

  • sagemaker:ListNotebookInstances / DescribeNotebookInstance
  • sagemaker:ListDomains / DescribeDomain
  • sagemaker:ListUserProfiles
  • sagemaker:ListModels / DescribeModel
  • sagemaker:ListEndpoints / DescribeEndpoint
  • sagemaker:ListTrainingJobs
  • sagemaker:ListModelPackages

Alternatives Considered

  • Focusing only on notebook instances - this misses the broader ML pipeline and deployment surface
  • Using CloudTrail for SageMaker activity - provides audit trail but not resource configuration

Relevant Links

kunaals avatar Dec 08 '25 19:12 kunaals