MachineLearningNotebooks icon indicating copy to clipboard operation
MachineLearningNotebooks copied to clipboard

Use Hyperdrive to optimize pipeline hyperparameters

Open JackCaster opened this issue 3 years ago • 2 comments

I would like to use Hyperdrive to optimize a full pipeline. That is, I would like to optimize hyperparameters on different steps jointly. I raised the issue here https://github.com/MicrosoftDocs/azure-docs/issues/77227 but I was suggested to open it here too.

For example, I have a pipeline defined as:

[prepare_data]
   |
   v
[extract_features]
   |
   v
[train_model]

I can use Hyperdriver to tune the hyperparameters of my ML model in the train_model step based upon some metrics, say validation loss. What I would like to do is to tune the hyperparameters in the train_model step together with the hyperparameters in the pre-processing steps (e.g., extract_features). For example, I would like to find the best sequence length in extract_features that can improve the loss in the model training.

HyperDriveConfig does accept an argument pipeline, which seems to be exactly what I am looking for. Unfortunately, I cannot find much information on how to use this parameter.

I tried to submit a Hyperdriver run as:

hd_config = HyperDriveConfig(
    hyperparameter_sampling=...,
    policy=...,
    primary_metric_name=...,
    primary_metric_goal=...,
    max_total_runs=...,
    max_duration_minutes=...,
    max_concurrent_runs=...,
    pipeline=pipeline,
)

exp = Experiment(workspace=ws, name="test")
hd_run = exp.submit(hd_config)

where pipeline is one of my published pipelines in the workspace that accepts PipelineParameters to tune. However, I get the error:

Exception has occurred: AttributeError
'PublishedPipeline' object has no attribute 'graph'

How could I proceed?

JackCaster avatar Jun 24 '21 12:06 JackCaster

I tried using the Pipeline object directly---instead of retrieving a PublishedPipeline---and the run is submitted, but nothing happens. It seems like the process is hanging for some reasons. The logs are not very informative.

In the Azure ML portal, an experiment is created but no run appears: image

However, via the notification, I get notified that Run 2 is running (which does not appear in the experiment page): image

Anyway, nothing is submitted to the compute target. In the log file hyperdrive.txt a "transient issue" is mentioned:

Streaming azureml-logs/hyperdrive.txt
=====================================

"<START>[2021-06-24T13:28:04.313162][API][INFO]Experiment created<END>\n""<START>[2021-06-24T13:28:04.985004][GENERATOR][INFO]Trying to sample '1' jobs from the hyperparameter space<END>\n""<START>[2021-06-24T13:28:05.164841][GENERATOR][INFO]Successfully sampled '1' 
jobs, they will soon be submitted to the execution target.<END>\n"
   at Microsoft.HyperDrive.Scheduler.AML.Client.HttpClientExtensions.PostRequestAsync[TRet](HttpClient client, HttpRequestMessage request, TimeSpan timeout) in /usr/src/app/SchedulerLib/AML/Client/HttpClientExtensions.cs:line 83
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.<>c__DisplayClass44_0.<<CreateUnsubmittedPipelineRunAsync>b__0>d.MoveNext() in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 480
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForAMLOperationAsync[T](String durationMetric, String successMetric, String failureMetric, String retriableFailureMetric, String throttledMetric, String timeoutMetric, String usageErrorMetric, String operationName, Func`1 operation, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 636
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForPipelinesOperationAsync[T](String pipelinesOperationName, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid, Func`1 pipelinesOperation) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 582
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.CreateUnsubmittedPipelineRunAsync(CreateUnSubmittedPipelineRunRequest request) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 461
   at Microsoft.HyperDrive.Scheduler.AML.AMLPipelinesScheduler.ScheduleAsync(ExperimentContext`1 experimentContext, Job`1 job) in /usr/src/app/SchedulerLib/AML/AMLPipelinesScheduler.cs:line 61'<END>
   at Microsoft.HyperDrive.Scheduler.AML.Client.HttpClientExtensions.PostRequestAsync[TRet](HttpClient client, HttpRequestMessage request, TimeSpan timeout) in /usr/src/app/SchedulerLib/AML/Client/HttpClientExtensions.cs:line 83
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.<>c__DisplayClass44_0.<<CreateUnsubmittedPipelineRunAsync>b__0>d.MoveNext() in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 480
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForAMLOperationAsync[T](String durationMetric, String successMetric, String failureMetric, String retriableFailureMetric, String throttledMetric, String timeoutMetric, String usageErrorMetric, String operationName, Func`1 operation, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 636
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.ReportMetricsForPipelinesOperationAsync[T](String pipelinesOperationName, String serializedExperimentId, Guid subscriptionId, Nullable`1 experimentUuid, Func`1 pipelinesOperation) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 582
   at Microsoft.HyperDrive.Scheduler.AML.Client.AMLRestClient.CreateUnsubmittedPipelineRunAsync(CreateUnSubmittedPipelineRunRequest request) in /usr/src/app/SchedulerLib/AML/Client/AMLRestClient.cs:line 461
   at Microsoft.HyperDrive.Scheduler.AML.AMLPipelinesScheduler.ScheduleAsync(ExperimentContext`1 experimentContext, Job`1 job) in /usr/src/app/SchedulerLib/AML/AMLPipelinesScheduler.cs:line 61'<END><START>[2021-06-24T13:29:35.1500321Z][SCHEDULER][WARNING]Failed to schedule job due to transient issue, id='HD_9beabcff-5ac0-4b96-b8ba-49612d9c63f7_0', will retry later.<END><START>[2021-06-24T13:29:35.1123576Z][SCHEDULER][INFO]Scheduling job, id='HD_9beabcff-5ac0-4b96-b8ba-49612d9c63f7_0'<END>

JackCaster avatar Jun 24 '21 13:06 JackCaster

I'm solving exactly the same issue. Are there any updates, please?

mficek avatar Apr 07 '22 09:04 mficek