SageMaker SDK Bug Report: HyperparameterTuner Missing Container Mode Support
Describe the bug
HyperparameterTuner does not preserve container mode parameters (container_entry_point and container_arguments) when creating training jobs, causing tuning jobs to fail. Individual training jobs work correctly with container mode, but hyperparameter tuning jobs lose the container configuration and fall back to script mode logic, resulting in failures.
To reproduce
from sagemaker.estimator import Estimator
from sagemaker.tuner import HyperparameterTuner, ContinuousParameter
# Create estimator with container mode
estimator = Estimator(
image_uri="123456789012.dkr.ecr.us-east-1.amazonaws.com/my-image:latest",
role="arn:aws:iam::123456789012:role/SageMakerRole",
instance_type="ml.m5.large",
instance_count=1,
container_entry_point=["python", "-m", "my_module"],
container_arguments=["train", "model1"],
)
# Create hyperparameter tuner
tuner = HyperparameterTuner(
estimator=estimator,
objective_metric_name="accuracy",
objective_type="Maximize",
hyperparameter_ranges={
"learning_rate": ContinuousParameter(0.001, 0.1)
},
max_jobs=2,
max_parallel_jobs=1,
)
# This will fail - individual training jobs missing container parameters
tuner.fit()
Expected behavior
The hyperparameter tuning job should preserve the container mode configuration and set ContainerEntrypoint and ContainerArguments in the AlgorithmSpecification of individual training jobs, just like when calling estimator.fit() directly.
Screenshots or logs Individual training job within tuning job shows missing container parameters:
"AlgorithmSpecification": {
"TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-image:latest",
"TrainingInputMode": "File",
"MetricDefinitions": [...],
"EnableSageMakerMetricsTimeSeries": false
// Missing: ContainerEntrypoint and ContainerArguments
}
Training jobs fail with:
AlgorithmError: Framework Error:
AttributeError: 'NoneType' object has no attribute 'endswith'
System information
- SageMaker Python SDK version: 2.244.2
- Framework name: Custom container (Estimator class)
- Framework version: N/A
- Python version: 3.10
- CPU or GPU: CPU
- Custom Docker image (Y/N): Y
Additional context
Root cause analysis
The issue is in two locations in the SDK:
1. sagemaker/job.py - Missing container parameter extraction
_Job._load_config() method (lines 117-124) only extracts basic configuration and ignores container mode parameters:
return {
"input_config": input_config,
"role": role,
"output_config": output_config,
"resource_config": resource_config,
"stop_condition": stop_condition,
"vpc_config": vpc_config,
# Missing: container_entry_point, container_arguments
}
2. sagemaker/session.py - Missing container parameter handling
_map_training_config() method (line 3584+) doesn't accept container parameters in its signature and doesn't include them in the AlgorithmSpecification (lines 3685-3694).
The method signature is missing container_entry_point and container_arguments parameters, and the AlgorithmSpecification construction only includes:
algorithm_spec = {"TrainingInputMode": input_mode}
if metric_definitions is not None:
algorithm_spec["MetricDefinitions"] = metric_definitions
if algorithm_arn:
algorithm_spec["AlgorithmName"] = algorithm_arn
else:
algorithm_spec["TrainingImage"] = image_uri
# Missing: ContainerEntrypoint and ContainerArguments
Comparison with working code
Individual training jobs work because session.train() correctly handles container parameters (lines 1266-1270):
if container_entry_point is not None:
train_request["AlgorithmSpecification"]["ContainerEntrypoint"] = container_entry_point
if container_arguments is not None:
train_request["AlgorithmSpecification"]["ContainerArguments"] = container_arguments
Code path analysis
Working path (individual training jobs):
estimator.fit()→session.train()→ ✅ Includes container parameters
Broken path (hyperparameter tuning):
tuner.fit()→_TuningJob._prepare_training_config()- →
_Job._load_config()→ ❌ Drops container parameters - →
session._map_training_config()→ ❌ Doesn't handle container parameters
Verification
- ✅ Container mode works with
estimator.fit()(individual training jobs) - ❌ Container mode fails with
tuner.fit()(hyperparameter tuning) - ✅ Script mode works with
tuner.fit()
Impact
This prevents users from using container mode with hyperparameter tuning, forcing them to use script mode for tuning jobs even when their training logic is containerized.
Suggested fix
- Update
_Job._load_config()to extract container parameters from the estimator:
# Add to the return dict:
config = {
"input_config": input_config,
"role": role,
"output_config": output_config,
"resource_config": resource_config,
"stop_condition": stop_condition,
"vpc_config": vpc_config,
}
# Add container mode parameters
if hasattr(estimator, 'container_entry_point') and estimator.container_entry_point:
config['container_entry_point'] = estimator.container_entry_point
if hasattr(estimator, 'container_arguments') and estimator.container_arguments:
config['container_arguments'] = estimator.container_arguments
return config
- Update
_map_training_config()signature to accept container parameters and include them inAlgorithmSpecification:
def _map_training_config(
cls,
static_hyperparameters,
input_mode,
role,
output_config,
stop_condition,
# ... existing params ...
container_entry_point=None, # Add this
container_arguments=None, # Add this
):
# ... existing code ...
# Add to AlgorithmSpecification:
if container_entry_point is not None:
algorithm_spec["ContainerEntrypoint"] = container_entry_point
if container_arguments is not None:
algorithm_spec["ContainerArguments"] = container_arguments
This would align the hyperparameter tuning code path with the working individual training job implementation.
Hi @josh-gree Have you explored the new interface ModelTrainer which is an upgrade to the Estimator ?
https://sagemaker.readthedocs.io/en/stable/api/training/model_trainer.html https://aws.amazon.com/blogs/machine-learning/accelerate-your-ml-lifecycle-using-the-new-and-improved-amazon-sagemaker-python-sdk-part-1-modeltrainer/
The container entrypoint info is gotten through the source_code parameter.
Please take a look at these and let us know if this is still a gap.
@nargokul I have not seen that this exists - should I take this to mean that the Estimator approach is basically no longer supported? What does this mean for HyperparameterTuner which takes an estimator as input?
We'd recommend using ModelTrainer. ModelTrainer as the next-generation interface for the SageMaker Python SDK, and it’s designed to address many of the pain points you may have experienced with the traditional Estimator approach: Key Benefits You’ll Experience: • Simplified API Design: More intuitive and consistent interface that reduces boilerplate code • Enhanced Flexibility: Better support for modern ML frameworks and custom training scenarios • Improved Performance: Optimized for faster training job setup and execution • Future-Proof: All new SageMaker features and optimizations will be built on ModelTrainer first Migration Path While Estimator remains fully supported in SDK v2, ModelTrainer represents AWS’s strategic direction. By adopting it now, you’ll: • Stay ahead of the curve with the latest capabilities • Benefit from ongoing performance improvements and new features • Ensure your codebase aligns with AWS best practices • Reduce technical debt as the ecosystem evolves
Also along with this, we also have sagemaker-core library , which is a lower level SDK. https://sagemaker-core.readthedocs.io/en/stable/ https://aws.amazon.com/blogs/machine-learning/introducing-sagemaker-core-a-new-object-oriented-python-sdk-for-amazon-sagemaker/
For your case for HyperParameter Job creation , the HyperParameterTuningJob.create() should replace the implementation that is in HyperparameterTuner currently
https://github.com/aws/sagemaker-core/blob/main/src/sagemaker_core/main/resources.py#L13454
As for the actual support for the parameters, looks like these parameters container_entrypoint and container_arguments are not supported at the API level. https://boto3.amazonaws.com/v1/documentation/api/1.26.85/reference/services/sagemaker/client/create_hyper_parameter_tuning_job.html.
Will check with the API team and keep this thread posted
Are there any updates on this? The parameters container_entrypoint and container_arguments are supported in AlgorithmSpecification for CreateTrainingJob, but are not supported in the equivalent HyperParameterAlgorithmSpecification for CreateHyperParameterTuningJob. This makes hyperparameter tuning impossible when using container mode.