MLOpsPython icon indicating copy to clipboard operation
MLOpsPython copied to clipboard

Avoid MLPublishedPipelineRestAPITask with a mounted FileDataset input?

Open metazool opened this issue 4 years ago • 0 comments
trafficstars

Thank you for providing these examples. They have been helpful when setting up a pipeline to train on an existing FileDataset. There are some differences between the usage examples for passing Datasets into a train script in the notebooks repo and the way it is done in the train stage of the pipeline in this project which make it more complex to invoke a published pipeline.

This relates to the https://github.com/microsoft/MLOpsPython/issues/291 https://github.com/microsoft/MLOpsPython/issues/345 various issues raised around the ms-air-aiagility.vss-services-azureml.azureml-restApi-task.MLPublishedPipelineRestAPITask@0 task and whether it is truly necessary, and still recommended, to create a second Service Connection with Machine Learning Workspace scope only to invoke a published pipeline, or whether it is better to replace it with a submit-pipeline CLI call.

In our setup we are passing a mounted FileDataset to the train script as a named parameter, rather than calling Dataset.get_by_name within the experiment run. It looks more like the notebook examples passing a Dataset into a ScriptRunConfig.

 PythonScriptStep(
        name="Train Model",
        script_name=env.train_script_path,
        source_directory=env.sources_directory,
        arguments=[
            "--data_dir",
            FileDataset.get_by_name(ws, env.dataset_name).as_mount(),
...

https://github.com/microsoft/MLOpsPython/blob/ae60e489f0c658ba313e6e0020c61b40ffe3bdc9/diabetes_regression/training/train_aml.py#L131

However the call to as_mount() means we can't now replace MLPublishedPipelineRestAPITask with a call to az ml run submit-pipeline as others suggest in the issues (and thus avoid having to create a second Service Connection with ML Workspace scope for every workspace where we want to invoke pipelines via the REST API).

I would appreciate any advice on good practise, whether I'm missing something obvious, and any information as to whether the pipeline patterns suggested in this repository are going to significantly change when the v2 preview CLI is released

cc @lindacmsheard

metazool avatar Jun 09 '21 16:06 metazool