HyperparameterTuner and Experiments UI not showing data
Describe the bug SageMaker Experiments console, showing not data to create charts for metrics defined in job.
I'm currently running a HPO jobs, created by using the HyperparameterTuner object and Tensorflow estimator. This is the code portion for creating the HPO job, and estimator to be used:
objective_metric_name = 'test loss'
objective_type = 'Minimize'
metric_definitions = [
{
"Name": "test loss",
"Regex": "Test loss: ([0-9\\.]+)",
},
{
"Name": "train loss",
"Regex": "categorical_crossentropy: ([0-9\\.]+)",
},
{
"Name": "val loss",
"Regex": "val_categorical_crossentropy: ([0-9\\.]+)",
},
]
ts = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
experiment_name = 'DM-AMT-exp-' + ts
job_name=f'DM-exp-amt-{ts}'
trials_output_path = output_path + '/' + experiment_name
code_location_output_path = output_path + '/' + experiment_name
tf_estimator = TensorFlow(entry_point = 'entrypoint-amt.py',
source_dir = 'src',
output_path = trials_output_path,
code_location = code_location_output_path,
role = role,
metric_definitions = metric_definitions,
instance_count = 1,
enable_sagemaker_metrics = True,
instance_type = 'ml.m5.4xlarge',
framework_version ='2.2',
py_version ='py37',)
tuner = HyperparameterTuner(estimator = tf_estimator,
objective_metric_name = objective_metric_name,
hyperparameter_ranges = hyperparameter_ranges,
metric_definitions = metric_definitions,
max_jobs = 2,
max_parallel_jobs = 2,
objective_type = objective_type,
random_seed = 14
)
tuner.fit(processed_data_path,
job_name = job_name,)
I wait for the training of the jobs to finished, and they appeared in the Experiments console (in SageMaker studio). These are the metrics for one of the two jobs:
However, when trying to create a chart, to see what is the train loss over the epochs, I get a message that there is not data.
When I look at the training job settings, in the SageMaker console, I see that the "SageMaker metrics time series" is disabled, eventhough in my estimator, Tensorflow, I have it as True.
Not sure why the estimator configuration is not kept, when using the HyperParameterTuner object. When calling the .fit() method from the estimator, it keeps the enable_sagemaker_metrics = True.
System information A description of your system. Please provide:
- SageMaker Python SDK version (used in notebook): 2.160.0
@fjpa121197 thanks for reaching out sagemaker! It seems like we are setting enable_sagemaker_metrics = True when calling create_training_job api from SageMaker PySDK. Can you provide SageMaker training job ARN for further debug?
I have the same problem. Is it safe to display the ARN here so that you can debug? Any other information that you need?