azure-cli icon indicating copy to clipboard operation
azure-cli copied to clipboard

az ml cli v2 pipeline yml does not support keyword 'is_deterministic'

Open MarkusDressel opened this issue 2 years ago • 4 comments

Related command

Describe the bug I want to run an azure ml pipeline using azure-cli v2. The steps should be non-deterministic (in sdk = 'allow_reuse'=False). Based on the pipeline schema, this should be set using

is_deterministic: false 

which is not accepted when submitting the job using

az ml job create -f pipeline.yml --web

it throws an unrelated error:

Met error <class 'TypeError'>:ParameterizedParallel.__init__() got an unexpected keyword argument 'environment'

When submitting the job without setting deterministic, the pipeline works fine (but being not deterministic) Here is the pipeline yml definition I use:

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
compute: azureml:cpu-cluster
jobs:
  scrape:
    code: ./src
    command: python run.py --dataset_path ${{inputs.datainput}}
    environment: azureml:my_environment@latest
    inputs:
      datainput:
        type: uri_folder
        path: azureml://datastores/workspaceblobstore/paths/path/to/my/folder/
    is_deterministic: false # without this - pipeline works fine (not deterministic)

To Reproduce Create any pipeline with constant input parameters and no explicit output. Try to make it deterministic using above yml file.

Expected behavior Setting 'is_deterministic: false' should be a valid entry and error is not raised.

Environment summary

az version

outputs:

{
  "azure-cli": "2.37.0",
  "azure-cli-core": "2.37.0",
  "azure-cli-telemetry": "1.0.6",
  "extensions": {
    "ml": "2.4.1"
  }
}

MarkusDressel avatar Aug 02 '22 05:08 MarkusDressel

route to CXP team

yonzhan avatar Aug 02 '22 05:08 yonzhan

@MarkusDressel We are looking into it and get back to you for any additional information.

SaurabhSharma-MSFT avatar Aug 03 '22 18:08 SaurabhSharma-MSFT

@SaurabhSharma-MSFT are there any updates on this topic? It is really annoying that the az ml cli v2 does not allow to set this parameter while the python sdk has this feature.

MarkusDressel avatar Aug 12 '22 07:08 MarkusDressel

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.

Issue Details

Related command

Describe the bug I want to run an azure ml pipeline using azure-cli v2. The steps should be non-deterministic (in sdk = 'allow_reuse'=False). Based on the pipeline schema, this should be set using

is_deterministic: false 

which is not accepted when submitting the job using

az ml job create -f pipeline.yml --web

it throws an unrelated error:

Met error <class 'TypeError'>:ParameterizedParallel.__init__() got an unexpected keyword argument 'environment'

When submitting the job without setting deterministic, the pipeline works fine (but being not deterministic) Here is the pipeline yml definition I use:

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
compute: azureml:cpu-cluster
jobs:
  scrape:
    code: ./src
    command: python run.py --dataset_path ${{inputs.datainput}}
    environment: azureml:my_environment@latest
    inputs:
      datainput:
        type: uri_folder
        path: azureml://datastores/workspaceblobstore/paths/path/to/my/folder/
    is_deterministic: false # without this - pipeline works fine (not deterministic)

To Reproduce Create any pipeline with constant input parameters and no explicit output. Try to make it deterministic using above yml file.

Expected behavior Setting 'is_deterministic: false' should be a valid entry and error is not raised.

Environment summary

az version

outputs:

{
  "azure-cli": "2.37.0",
  "azure-cli-core": "2.37.0",
  "azure-cli-telemetry": "1.0.6",
  "extensions": {
    "ml": "2.4.1"
  }
}
Author: MarkusDressel
Assignees: -
Labels:

Service Attention, Machine Learning, customer-reported, CXP Attention, Auto-Assign

Milestone: -

ghost avatar Aug 12 '22 17:08 ghost

@MarkusDressel There is an implementation bug and we already created work item internally tracking on it.

To workaround your issue, please try to make scrape a separate component file and refer it in your pipeline job; it may look like below:

scrape.yml

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
type: command
code: ./src
command: python run.py --dataset_path ${{inputs.datainput}}
environment: azureml:my_environment@latest
inputs:
  datainput:
    type: uri_folder
is_deterministic: false

pipeline.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
compute: azureml:cpu-cluster
jobs:
  scrape:
    component: file:./scrape.yml
    inputs:
      datainput:
        type: uri_folder
        path: azureml://datastores/workspaceblobstore/paths/path/to/my/folder/

zhengfeiwang avatar Aug 15 '22 08:08 zhengfeiwang

Hi @MarkusDressel , We didn't expose the allow_reuse on step level in CLI v2 now. And CLI v2there are registered components and anonymous components(inline jobs). The default reuse settings for anonymous components is is_deterministic=true

Two workarounds we have to disable reuse:

  1. Explicitly change is_deterministic=false in the anonymous component just as Zhengfei shared.
  2. We also expose force_rerun under pipeline level settings, if it is set to true, we will try to disable reuse for all steps.

Is there some case from your side part of the pipeline step needs reuse but part of them needs rerun?

cloga avatar Aug 16 '22 01:08 cloga

Hi @MarkusDressel ,

This is Blanca, a PM working on AzureML pipelines. Thanks a lot for your feedback at first. We would appreciate if we could set up a meeting to collect your feedback. Your inputs are invaluable to us and will help us improve the whole AzureML v2 experience. Could you please kindly let me know what time works for you? My email is [email protected] Thanks!

likebupt avatar Aug 22 '22 06:08 likebupt