azure-cli
azure-cli copied to clipboard
az ml cli v2 pipeline yml does not support keyword 'is_deterministic'
Related command
Describe the bug I want to run an azure ml pipeline using azure-cli v2. The steps should be non-deterministic (in sdk = 'allow_reuse'=False). Based on the pipeline schema, this should be set using
is_deterministic: false
which is not accepted when submitting the job using
az ml job create -f pipeline.yml --web
it throws an unrelated error:
Met error <class 'TypeError'>:ParameterizedParallel.__init__() got an unexpected keyword argument 'environment'
When submitting the job without setting deterministic, the pipeline works fine (but being not deterministic) Here is the pipeline yml definition I use:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
compute: azureml:cpu-cluster
jobs:
scrape:
code: ./src
command: python run.py --dataset_path ${{inputs.datainput}}
environment: azureml:my_environment@latest
inputs:
datainput:
type: uri_folder
path: azureml://datastores/workspaceblobstore/paths/path/to/my/folder/
is_deterministic: false # without this - pipeline works fine (not deterministic)
To Reproduce Create any pipeline with constant input parameters and no explicit output. Try to make it deterministic using above yml file.
Expected behavior Setting 'is_deterministic: false' should be a valid entry and error is not raised.
Environment summary
az version
outputs:
{
"azure-cli": "2.37.0",
"azure-cli-core": "2.37.0",
"azure-cli-telemetry": "1.0.6",
"extensions": {
"ml": "2.4.1"
}
}
route to CXP team
@MarkusDressel We are looking into it and get back to you for any additional information.
@SaurabhSharma-MSFT are there any updates on this topic? It is really annoying that the az ml cli v2 does not allow to set this parameter while the python sdk has this feature.
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.
Issue Details
Related command
Describe the bug I want to run an azure ml pipeline using azure-cli v2. The steps should be non-deterministic (in sdk = 'allow_reuse'=False). Based on the pipeline schema, this should be set using
is_deterministic: false
which is not accepted when submitting the job using
az ml job create -f pipeline.yml --web
it throws an unrelated error:
Met error <class 'TypeError'>:ParameterizedParallel.__init__() got an unexpected keyword argument 'environment'
When submitting the job without setting deterministic, the pipeline works fine (but being not deterministic) Here is the pipeline yml definition I use:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
compute: azureml:cpu-cluster
jobs:
scrape:
code: ./src
command: python run.py --dataset_path ${{inputs.datainput}}
environment: azureml:my_environment@latest
inputs:
datainput:
type: uri_folder
path: azureml://datastores/workspaceblobstore/paths/path/to/my/folder/
is_deterministic: false # without this - pipeline works fine (not deterministic)
To Reproduce Create any pipeline with constant input parameters and no explicit output. Try to make it deterministic using above yml file.
Expected behavior Setting 'is_deterministic: false' should be a valid entry and error is not raised.
Environment summary
az version
outputs:
{
"azure-cli": "2.37.0",
"azure-cli-core": "2.37.0",
"azure-cli-telemetry": "1.0.6",
"extensions": {
"ml": "2.4.1"
}
}
Author: | MarkusDressel |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
@MarkusDressel There is an implementation bug and we already created work item internally tracking on it.
To workaround your issue, please try to make scrape
a separate component file and refer it in your pipeline job; it may look like below:
scrape.yml
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
type: command
code: ./src
command: python run.py --dataset_path ${{inputs.datainput}}
environment: azureml:my_environment@latest
inputs:
datainput:
type: uri_folder
is_deterministic: false
pipeline.yml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
compute: azureml:cpu-cluster
jobs:
scrape:
component: file:./scrape.yml
inputs:
datainput:
type: uri_folder
path: azureml://datastores/workspaceblobstore/paths/path/to/my/folder/
Hi @MarkusDressel ,
We didn't expose the allow_reuse on step level in CLI v2 now. And CLI v2there are registered components and anonymous components(inline jobs). The default reuse settings for anonymous components is is_deterministic=true
Two workarounds we have to disable reuse:
- Explicitly change
is_deterministic=false
in the anonymous component just as Zhengfei shared. - We also expose
force_rerun
under pipeline level settings, if it is set to true, we will try to disable reuse for all steps.
Is there some case from your side part of the pipeline step needs reuse but part of them needs rerun?
Hi @MarkusDressel ,
This is Blanca, a PM working on AzureML pipelines. Thanks a lot for your feedback at first. We would appreciate if we could set up a meeting to collect your feedback. Your inputs are invaluable to us and will help us improve the whole AzureML v2 experience. Could you please kindly let me know what time works for you? My email is [email protected] Thanks!