aws-sdk-go-v2 icon indicating copy to clipboard operation
aws-sdk-go-v2 copied to clipboard

Sagemaker Domain: Unable to remove `DomainSettings.SecurityGroupIds`

Open jar-b opened this issue 7 months ago • 3 comments

Acknowledgements

  • [x] I have searched (https://github.com/aws/aws-sdk/issues?q=is%3Aissue) for past instances of this issue
  • [x] I have verified all of my SDK modules are up-to-date (you can perform a bulk update with go get -u github.com/aws/aws-sdk-go-v2/...)

Describe the bug

After creating a SageMaker domain with DomainSettings.SecurityGroupIds configured, the security group cannot be removed. The SecurityGroupIds argument is a []string and I've attempted setting the value to the following.

  1. nil - no effect
  2. []string{} - no effect
  3. []string{""} - validation error (mostly trying to rule out behavior like what was observed in #2788)

I'd expect either 1 or 2 to from above to remove the existing security group IDs from the domain settings.

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Expected Behavior

Setting SecurityGroupIds to []string{} in the update request removes them.

Current Behavior

Setting SecurityGroupIds to []string{} in the update request does nothing.

Reproduction Steps

Reproduction is complicated a bit by the fact that a SageMaker domain depends on several upstream resources (VPC, subnet, IAM role, security group). I used the following Terraform configuration to construct the initial state.

Show/Hide Configuration
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Configure the AWS Provider
provider "aws" {}

data "aws_partition" "current" {}

data "aws_availability_zones" "available" {
  exclude_zone_ids = ["usw2-az4", "usgw1-az2"]
  state            = "available"

  filter {
    name   = "opt-in-status"
    values = ["opt-in-not-required"]
  }
}

resource "aws_vpc" "test" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "jb-test"
  }
}

resource "aws_subnet" "test" {
  count = 1

  vpc_id            = aws_vpc.test.id
  availability_zone = data.aws_availability_zones.available.names[count.index]
  cidr_block        = cidrsubnet(aws_vpc.test.cidr_block, 8, count.index)

  tags = {
    Name = "jb-test"
  }
}

resource "aws_iam_role" "test" {
  name               = "jb-test"
  path               = "/"
  assume_role_policy = data.aws_iam_policy_document.test.json
}

data "aws_iam_policy_document" "test" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["sagemaker.${data.aws_partition.current.dns_suffix}"]
    }
  }
}

resource "aws_iam_role_policy_attachment" "test" {
  role       = aws_iam_role.test.name
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonSageMakerFullAccess"
}

resource "aws_security_group" "test" {
  vpc_id = aws_vpc.test.id
  name   = "jb-test"
}

resource "aws_sagemaker_domain" "test" {
  domain_name = "jb-test"
  auth_mode   = "IAM"
  vpc_id      = aws_vpc.test.id
  subnet_ids  = aws_subnet.test[*].id

  default_user_settings {
    execution_role = aws_iam_role.test.arn
  }

  domain_settings {
    execution_role_identity_config = "DISABLED"
    security_group_ids             = [aws_security_group.test.id]
  }

  retention_policy {
    home_efs_file_system = "Delete"
  }
}
  1. Apply the configuration above (or create an analogous setup via the console/CloudFormation)
  2. Run the script below.
package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/sagemaker"
	"github.com/aws/aws-sdk-go-v2/service/sagemaker/types"
)

func main() {
	// CHANGE THIS AS APPROPRIATE
	domainID := aws.String("d-<redacted>")

	ctx := context.TODO()
	cfg, err := config.LoadDefaultConfig(ctx, config.WithRegion("us-west-2"))
	if err != nil {
		log.Fatalf("unable to load SDK config, %v", err)
	}
	client := sagemaker.NewFromConfig(cfg)

	outBefore, err := client.DescribeDomain(ctx, &sagemaker.DescribeDomainInput{
		DomainId: domainID,
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("domain settings security group IDs before: %+v\n", outBefore.DomainSettings.SecurityGroupIds)

	fmt.Println("setting domain settings security group IDs to an empty array...")
	if _, err := client.UpdateDomain(ctx, &sagemaker.UpdateDomainInput{
		DomainId:             domainID,
		AppNetworkAccessType: "PublicInternetOnly",
		DomainSettingsForUpdate: &types.DomainSettingsForUpdate{
			SecurityGroupIds: []string{},

			// other attempted permutations
			// SecurityGroupIds: nil, // no change
			// SecurityGroupIds: []string{""},  // validation error
			// SecurityGroupIds: []string{" "}, // validation error
		},
	}); err != nil {
		log.Fatal(err)
	}

	fmt.Println("waiting for domain to come back into service...")
	for {
		out, err := client.DescribeDomain(ctx, &sagemaker.DescribeDomainInput{
			DomainId: domainID,
		})
		if err != nil {
			log.Fatal(err)
		}

		if out.Status == types.DomainStatusInService {
			break
		}
		fmt.Printf("status: %s. Sleeping 5 seconds.\n", out.Status)
		time.Sleep(5 * time.Second)
	}

	outAfter, err := client.DescribeDomain(ctx, &sagemaker.DescribeDomainInput{
		DomainId: domainID,
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("domain settings security group IDs after: %+v\n", outAfter.DomainSettings.SecurityGroupIds)
}
  1. Observe that the update does not take. For example,
% go run main.go
domain settings security group IDs before: [sg-<redacted>]
setting domain settings security group IDs to an empty array...
waiting for domain to come back into service...
status: Updating. Sleeping 5 seconds.
domain settings security group IDs after: [sg-<redacted>]

Possible Solution

No response

Additional Information/Context

Relates https://github.com/hashicorp/terraform-provider-aws/pull/40726#issuecomment-2867690302 Relates https://github.com/hashicorp/terraform-provider-aws/issues/40600

AWS Go SDK V2 Module Versions Used

module main

go 1.23.6

require (
	github.com/aws/aws-sdk-go-v2 v1.36.3
	github.com/aws/aws-sdk-go-v2/config v1.29.14
	github.com/aws/aws-sdk-go-v2/service/sagemaker v1.191.0
)

require (
	github.com/aws/aws-sdk-go-v2/credentials v1.17.67 // indirect
	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.30 // indirect
	github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.34 // indirect
	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.34 // indirect
	github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.15 // indirect
	github.com/aws/aws-sdk-go-v2/service/sso v1.25.3 // indirect
	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.30.1 // indirect
	github.com/aws/aws-sdk-go-v2/service/sts v1.33.19 // indirect
	github.com/aws/smithy-go v1.22.2 // indirect
)

Compiler and Version used

go version go1.23.6 darwin/arm64

Operating System and version

MacOS Sequoia 15.4.1

jar-b avatar May 09 '25 20:05 jar-b

This seems to be more of an issue with Sagemaker rather than the SDK.

If you print the request the Go SDK makes by using

	cfg, err := config.LoadDefaultConfig(ctx,
		//...
		config.WithClientLogMode(aws.LogRequestWithBody),
	)

you can see that sending this input

client.UpdateDomain(ctx, &sagemaker.UpdateDomainInput{
		DomainId:             aws.String("d-sample"),
		AppNetworkAccessType: "PublicInternetOnly",
		DomainSettingsForUpdate: &types.DomainSettingsForUpdate{
			SecurityGroupIds: []string{},
		},
	})

does serialize security group ids as an empty list on the request body

{"AppNetworkAccessType":"PublicInternetOnly","DomainId":"d-<redacted>","DomainSettingsForUpdate":{"SecurityGroupIds":[]}}

This is treated differently than setting this to nil on which case nothing gets serialized

client.UpdateDomain(ctx, &sagemaker.UpdateDomainInput{
		DomainId:             aws.String("d-sample"),
		AppNetworkAccessType: "PublicInternetOnly",
		DomainSettingsForUpdate: &types.DomainSettingsForUpdate{
			SecurityGroupIds: nil, // explicit nil
		},
	})

in this case, the DomainSettingsForUpdate just remains empty

{"AppNetworkAccessType":"PublicInternetOnly","DomainId":"d-sample","DomainSettingsForUpdate":{}}

On the docs for SageMaker, it's not clear what the right request needs to be to remove SecurityGroupIds. The API reference docs don't call out this field, and the Domain Settings guide only says

SecurityGroupIds

The security groups for the Amazon Virtual Private Cloud that the Domain uses for communication between Domain-level apps and user apps.

Required: No

Type: Array of String

Minimum: 1

Maximum: 32 | 3

Madrigal avatar May 12 '25 17:05 Madrigal

Thanks for the additional context, @Madrigal. What is the best next step here - a support case directed to the SageMaker service team?

jar-b avatar May 12 '25 18:05 jar-b

@jar-b following up internally with the service team. Reference V1776730970. Will update here when I hear back

Madrigal avatar May 12 '25 19:05 Madrigal

Discussed internally, unfortunately this is currently not supported by SageMaker, and there's no way to remove all security groups of a domain setting, you would have to delete and recreate the domain. The team is aware of this, but there's no ETA available.

Madrigal avatar Aug 18 '25 18:08 Madrigal