MachineLearningNotebooks
MachineLearningNotebooks copied to clipboard
Use Hyperdrive to optimize pipeline hyperparameters
I would like to use Hyperdrive to optimize a full pipeline. That is, I would like to optimize hyperparameters on different steps jointly. I raised the issue here https://github.com/MicrosoftDocs/azure-docs/issues/77227 but I was suggested to open it here too.
For example, I have a pipeline defined as:
[prepare_data]
|
v
[extract_features]
|
v
[train_model]
I can use Hyperdriver to tune the hyperparameters of my ML model in the train_model
step based upon some metrics, say validation loss. What I would like to do is to tune the hyperparameters in the train_model
step together with the hyperparameters in the pre-processing steps (e.g., extract_features
). For example, I would like to find the best sequence length in extract_features
that can improve the loss in the model training.
HyperDriveConfig
does accept an argument pipeline
, which seems to be exactly what I am looking for. Unfortunately, I cannot find much information on how to use this parameter.
I tried to submit a Hyperdriver run as:
hd_config = HyperDriveConfig(
hyperparameter_sampling=...,
policy=...,
primary_metric_name=...,
primary_metric_goal=...,
max_total_runs=...,
max_duration_minutes=...,
max_concurrent_runs=...,
pipeline=pipeline,
)
exp = Experiment(workspace=ws, name="test")
hd_run = exp.submit(hd_config)
where pipeline
is one of my published pipelines in the workspace that accepts PipelineParameters to tune. However, I get the error:
Exception has occurred: AttributeError
'PublishedPipeline' object has no attribute 'graph'
How could I proceed?
I tried using the Pipeline
object directly---instead of retrieving a PublishedPipeline
---and the run is submitted, but nothing happens. It seems like the process is hanging for some reasons. The logs are not very informative.
In the Azure ML portal, an experiment is created but no run appears:
However, via the notification, I get notified that Run 2
is running (which does not appear in the experiment page):
Anyway, nothing is submitted to the compute target. In the log file hyperdrive.txt
a "transient issue" is mentioned:
Streaming azureml-logs/hyperdrive.txt
=====================================
"<START>[2021-06-24T13:28:04.313162][API][INFO]Experiment created<END>\n""<START>[2021-06-24T13:28:04.985004][GENERATOR][INFO]Trying to sample '1' jobs from the hyperparameter space<END>\n""<START>[2021-06-24T13:28:05.164841][GENERATOR][INFO]Successfully sampled '1'
jobs, they will soon be submitted to the execution target.<END>\n"
at Microsoft.HyperDrive.Scheduler.AML.Client.HttpClientExtensions.PostRequestAsync[TRet](HttpClient client, HttpRequestMessage request, TimeSpan timeout) in /usr/src/app/SchedulerLib/AML/Client/HttpClientExtensions.cs:line 83
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.<>c__DisplayClass44_0.<<CreateUnsubmittedPipelineRunAsync>b__0>d.MoveNext() in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 480
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForAMLOperationAsync[T](String durationMetric, String successMetric, String failureMetric, String retriableFailureMetric, String throttledMetric, String timeoutMetric, String usageErrorMetric, String operationName, Func`1 operation, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 636
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForPipelinesOperationAsync[T](String pipelinesOperationName, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid, Func`1 pipelinesOperation) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 582
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.CreateUnsubmittedPipelineRunAsync(CreateUnSubmittedPipelineRunRequest request) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 461
at Microsoft.HyperDrive.Scheduler.AML.AMLPipelinesScheduler.ScheduleAsync(ExperimentContext`1 experimentContext, Job`1 job) in /usr/src/app/SchedulerLib/AML/AMLPipelinesScheduler.cs:line 61'<END>
at Microsoft.HyperDrive.Scheduler.AML.Client.HttpClientExtensions.PostRequestAsync[TRet](HttpClient client, HttpRequestMessage request, TimeSpan timeout) in /usr/src/app/SchedulerLib/AML/Client/HttpClientExtensions.cs:line 83
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.<>c__DisplayClass44_0.<<CreateUnsubmittedPipelineRunAsync>b__0>d.MoveNext() in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 480
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForAMLOperationAsync[T](String durationMetric, String successMetric, String failureMetric, String retriableFailureMetric, String throttledMetric, String timeoutMetric, String usageErrorMetric, String operationName, Func`1 operation, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 636
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForPipelinesOperationAsync[T](String pipelinesOperationName, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid, Func`1 pipelinesOperation) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 582
at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.CreateUnsubmittedPipelineRunAsync(CreateUnSubmittedPipelineRunRequest request) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 461
at Microsoft.HyperDrive.Scheduler.AML.AMLPipelinesScheduler.ScheduleAsync(ExperimentContext`1 experimentContext, Job`1 job) in /usr/src/app/SchedulerLib/AML/AMLPipelinesScheduler.cs:line 61'<END><START>[2021-06-24T13:29:35.1500321Z][SCHEDULER][WARNING]Failed to schedule job due to transient issue, id='HD_9beabcff-5ac0-4b96-b8ba-49612d9c63f7_0', will retry later.<END><START>[2021-06-24T13:29:35.1123576Z][SCHEDULER][INFO]Scheduling job, id='HD_9beabcff-5ac0-4b96-b8ba-49612d9c63f7_0'<END>
I'm solving exactly the same issue. Are there any updates, please?