sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Experiments/Runs, allow user-defined Run Group names
When creating an experiment's Run instance in order to track a training or processing job, the user should have an ability to specify a custom name for this run's run_group_name .
We've been using SageMaker Experiments previously and are currently migrating from the standalone smexperiments library to the new, intergated SDK. Previously we had an ability to define a Trialand effectively group a number of experiment/job runs under this Trial's name. Our understanding is that the new run group concept is serving the same purpose, yet for a standalone job run (pipelines might be different) it is not possible to specify a user-defined run_group_name while defining the experiment's Run context.
We discovered that an additional complication arises because apparently there is a Quota for maximum 50 trial components (runs) per Trial but since the Run Group name is effectively 'fixed' per experiment (as Default-Run-Group-<experiment-name>) the whole experiment ends up limited to 50 runs :( We are currently requesting to increase this default limit.
any update on this? were you able to raise the limit or find a way to assign run group name?
any update on this? were you able to raise the limit or find a way to assign run group name?
We've got the limit for max trial components per experiment increased up to 200 (apparently this is the maximum possible). As to assigning the group name, the fix to this is still pending AFAIK.
In the source code of sagemaker experiments‘ Run, there is a _generate_trial_component() method that relies on sagemaker.experiments.run.TRIAL_NAME_TEMPLATE. You can overwrite that value that defaults to Default-Run-Group-<experiment-name> before you create the run. We use it like this:
import random
import datetime
import sagemaker.session
import sagemaker.experiments.run
import sagemaker.experiments.trial
experiment_name = 'backtesting'
sagemaker.experiments.run.TRIAL_NAME_TEMPLATE = f"week-30"
session = sagemaker.session.Session()
start_date = datetime.datetime.now() - datetime.timedelta(days=10)
with sagemaker.experiments.Run(experiment_name=experiment_name, run_display_name='champion', sagemaker_session=session) as run:
pass
This will create the run group with your desired name.
In the source code of sagemaker experiments‘
Run, there is a_generate_trial_component()method that relies onsagemaker.experiments.run.TRIAL_NAME_TEMPLATE. You can overwrite that value that defaults toDefault-Run-Group-<experiment-name>before you create the run. We use it like this:import random import datetime import sagemaker.session import sagemaker.experiments.run import sagemaker.experiments.trial experiment_name = 'backtesting' sagemaker.experiments.run.TRIAL_NAME_TEMPLATE = f"week-30" session = sagemaker.session.Session() start_date = datetime.datetime.now() - datetime.timedelta(days=10) with sagemaker.experiments.Run(experiment_name=experiment_name, run_display_name='champion', sagemaker_session=session) as run: passThis will create the run group with your desired name.
@lorenzwalthert I feel this is more of a workaround (or a hack even) and not a proper solution. We had our fingers burned before when we relied on SageMaker private code/API, they can change it without any prior notice and would have a right to do so :( So, I'd rather wait for an official resolution.
We would also like the requested capability to be added to sagemaker. Without the ability to specify, run_group is not useful.
@AndreiVoinovTR I agree that official support with docs etc. would be better, but sagemaker.experiments.run.TRIAL_NAME_TEMPLATE is strictly speaking not the private API. It's not generally uncommon to use the namespace as settings. Also, if they later change the API, my hope is that there might be another way to set the run group, so it would not be too detrimental. But I agree it's a narrow path 😄
@Selva163 @Drwhit upvoting the initial comment might be more helpful to give the issue traction instead of creating more comments without additional information (that triggers notifications).
I would still be grateful if one of the code owners from sagemaker-python-sdk team could comment on this issue, and maybe share with us the current status of this request in the team's backlog..
FWIW @AndreiVoinovTR, it seems that you can specify the run group in the SDK if your job is part of a pipeline, here, under Specify a Custom Run Group Name.
FWIW @AndreiVoinovTR, it seems that you can specify the run group in the SDK if your job is part of a pipeline, here, under Specify a Custom Run Group Name.
Thank you for the info, @lorenzwalthert. Yes, you are correct, one can specify a run group (former trial) for pipelines and that works. The pipelines-related part of the experiments API has not been refactored (externally at least). Why, and will this also be refactored eventually - this is another question.
The new API (with Run context) had been introduced only for standalone jobs (training, processing) but it seems the ability to specify a custom run group (former trial) had been 'dropped' (intentionally or not) from the new jobs API. That is exactly what is issue is about.