sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

HyperparameterTuner does not preserve `disable_output_compression` setting of Estimator

Open thvasilo opened this issue 6 months ago • 2 comments

Describe the bug When setting disable_output_compression=True in a PyTorch SageMaker Estimator that is used with HyperparameterTuner, the resulting HPO training jobs still compress the model output. This parameter is correctly set in the estimator but appears to be ignored when the estimator is used within a hyperparameter tuning job.

To reproduce

  1. Create a PyTorch estimator with disable_output_compression=True
  2. Use this estimator with a HyperparameterTuner
  3. Launch the hyperparameter tuning job
  4. Observe that the model outputs in the resulting training jobs are still compressed

Here's a minimal code example that reproduces the issue:

import sagemaker
from sagemaker.pytorch.estimator import PyTorch
from sagemaker.tuner import HyperparameterTuner, ContinuousParameter

# Set up the PyTorch estimator with disable_output_compression=True
role = "arn:aws:iam::123456789012:role/SageMakerRole"
estimator = PyTorch(
    disable_output_compression=True,  # This should disable compression
    entry_point="train.py",
    source_dir="./code",
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    py_version="py3",
)

# Set up the hyperparameter tuner
hyperparameter_ranges = {
    "learning-rate": ContinuousParameter(0.001, 0.1)
}

tuner = HyperparameterTuner(
    estimator=estimator,
    objective_metric_name="validation-accuracy",
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=[
        {
            "Name": "validation-accuracy",
            "Regex": "validation accuracy: ([0-9\\.]+)"
        }
    ],
    max_jobs=2,
    max_parallel_jobs=2
)

# Launch the tuning job
tuner.fit({"train": "s3://bucket/path/to/training/data"})

Expected behavior When disable_output_compression=True is set in the PyTorch estimator, all training jobs created by the HyperparameterTuner should respect this setting and not compress the model output.

Screenshots or logs When examining the training jobs created by the hyperparameter tuning job, the model artifacts are still compressed despite setting disable_output_compression=True in the estimator.

System information

  • SageMaker Python SDK version: 2.246.0
  • Framework name: PyTorch
  • Framework version: 2.3.0
  • Python version: 3.10
  • CPU or GPU: CPU
  • Custom Docker image: Yes, derived from pytorch-training:2.3.0-cpu-py311-ubuntu20.04-sagemaker

Additional context

This issue appears to be specific to hyperparameter tuning jobs. When using the same PyTorch estimator directly (not through a HyperparameterTuner), the disable_output_compression=True parameter works as expected and the model output is not compressed.

The issue might be that the disable_output_compression parameter is not being properly propagated from the estimator to the individual training jobs created by the HyperparameterTuner.

thvasilo avatar Jun 12 '25 20:06 thvasilo

To check the inner job config I did

tuner = HyperparameterTuner(...)

args = _TuningJob._get_tuner_args(tuner, inputs)

print(args)

which gave me

{'job_name': None,
 'tuning_config': {...},
 'tags': None,
 'warm_start_config': None,
 'autotune': False,
 'training_config': {'role': 'arn:aws:iam::XXXXXX',
                     'output_config': {'S3OutputPath': 's3://XXXXX/XXXXX/',
                                       'CompressionType': 'NONE'},
                     ...}

but I still see the actual training jobs produce model.tar.gz as the output

thvasilo avatar Jun 12 '25 21:06 thvasilo

FYI, from offline convo this seems to be an issue on the service side. Enabling debug logs to check this line https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/session.py#L3395

could see 'CompressionType': 'NONE' correctly in the API request to boto3

Erickkbentz avatar Jun 13 '25 22:06 Erickkbentz