azure-cli-extensions icon indicating copy to clipboard operation
azure-cli-extensions copied to clipboard

AzureML CLI v2 - R api - Mlflow causing pipeline to break

Open jakeatmsft opened this issue 1 year ago • 1 comments

  • If the issue is to do with Azure CLI 2.0 in-particular, create an issue here at Azure/azure-cli

Related command

az ml job create

Extension name (the extension in question)

ml

Description of issue (in as much detail as possible)

When running a ml component running R api mlflow code block using "with statement" block, following documentation example (https://mlflow.org/docs/latest/R-api.html#mlflow-start-run)

The component does not pass output to input of following step correctly, The input is passed as a "DataReference" and does not point to correct location. When removing the "with statement" block the pipeline runs correctly.

ComponentA working:

component.r: library(optparse) library("carrier") library(mlflow)

options <- list( make_option(c("-d", "--data_folder"), default="./data"), make_option(c("-o", "--out_folder"), default="./out") )

opt_parser <- OptionParser(option_list = options) opt <- parse_args(opt_parser)

paste(opt$data_folder) paste(opt$out_folder)

run <- mlflow_start_run() accidents <- readRDS(file.path(opt$data_folder, "accidents.Rd")) summary(accidents)

saveRDS(accidents, file.path(opt$out_folder, "predictions.Rd"))


Pipeline component json, subsequent step: "runDefinition": { "script": null, "command": "Rscript eval_model.r --model $AZUREML_DATAREFERENCE_component_b_input", "useAbsolutePath": false, "arguments": [], "sourceDirectoryDataStore": null, "framework": "Python", "communicator": "None", "target": "cpu-cluster", "dataReferences": {}, "data": {}, "inputAssets": { "component_b_input": { "asset": { "assetId": "azureml://locations/westus2/workspaces/dc720ed6-90b3-49cb-8383-0897f5db4402/data/azureml_3450fa88-128c-4026-9065-a0e214261911_output_data_component_a_output/versions/1", "type": "UriFolder" }, "mechanism": "Mount", "environmentVariableName": "AZURE_ML_INPUT_component_b_input", "pathOnCompute": null, "overwrite": true, "options": { "IsEvalMode": "False", "ReadWrite": "False", "ForceFolder": "False" } } },

ComponentA - results in error: component.r:

library(optparse) library("carrier") library(mlflow)

options <- list( make_option(c("-d", "--data_folder"), default="./data"), make_option(c("-o", "--out_folder"), default="./out") )

opt_parser <- OptionParser(option_list = options) opt <- parse_args(opt_parser)

paste(opt$data_folder) paste(opt$out_folder)

with(run <- mlflow_start_run(), { accidents <- readRDS(file.path(opt$data_folder, "accidents.Rd")) summary(accidents)

saveRDS(accidents, file.path(opt$out_folder, "predictions.Rd")) })


Pipeline component json, subsequent step:

"runDefinition": { "script": null, "command": "Rscript eval_model.r --model $AZUREML_DATAREFERENCE_model", "useAbsolutePath": false, "arguments": [], "sourceDirectoryDataStore": null, "framework": "Python", "communicator": "None", "target": "cpu-cluster", "dataReferences": { "model": { "dataStoreName": "workspaceblobstore", "mode": "Mount", "pathOnDataStore": "azureml/{name}/model_out/", "pathOnCompute": null, "overwrite": true } }, "inputs" : null

steps to reproduce:

  • open attached
  • create environment from docker file
  • uncomment lines /src/accident.R ln:27 and ln:46
  • run az ml create job -f pipeline.yml

accidents-copy.zip l

jakeatmsft avatar Aug 12 '22 18:08 jakeatmsft

route to CXP team

yonzhan avatar Aug 12 '22 22:08 yonzhan

@jakeatmsft Apologies for the delay, we are looking into it.

RakeshMohanMSFT avatar Oct 19 '22 04:10 RakeshMohanMSFT

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.

Issue Details
  • If the issue is to do with Azure CLI 2.0 in-particular, create an issue here at Azure/azure-cli

Related command

az ml job create

Extension name (the extension in question)

ml

Description of issue (in as much detail as possible)

When running a ml component running R api mlflow code block using "with statement" block, following documentation example (https://mlflow.org/docs/latest/R-api.html#mlflow-start-run)

The component does not pass output to input of following step correctly, The input is passed as a "DataReference" and does not point to correct location. When removing the "with statement" block the pipeline runs correctly.

ComponentA working:

component.r: library(optparse) library("carrier") library(mlflow)

options <- list( make_option(c("-d", "--data_folder"), default="./data"), make_option(c("-o", "--out_folder"), default="./out") )

opt_parser <- OptionParser(option_list = options) opt <- parse_args(opt_parser)

paste(opt$data_folder) paste(opt$out_folder)

run <- mlflow_start_run() accidents <- readRDS(file.path(opt$data_folder, "accidents.Rd")) summary(accidents)

saveRDS(accidents, file.path(opt$out_folder, "predictions.Rd"))


Pipeline component json, subsequent step: "runDefinition": { "script": null, "command": "Rscript eval_model.r --model $AZUREML_DATAREFERENCE_component_b_input", "useAbsolutePath": false, "arguments": [], "sourceDirectoryDataStore": null, "framework": "Python", "communicator": "None", "target": "cpu-cluster", "dataReferences": {}, "data": {}, "inputAssets": { "component_b_input": { "asset": { "assetId": "azureml://locations/westus2/workspaces/dc720ed6-90b3-49cb-8383-0897f5db4402/data/azureml_3450fa88-128c-4026-9065-a0e214261911_output_data_component_a_output/versions/1", "type": "UriFolder" }, "mechanism": "Mount", "environmentVariableName": "AZURE_ML_INPUT_component_b_input", "pathOnCompute": null, "overwrite": true, "options": { "IsEvalMode": "False", "ReadWrite": "False", "ForceFolder": "False" } } },

ComponentA - results in error: component.r:

library(optparse) library("carrier") library(mlflow)

options <- list( make_option(c("-d", "--data_folder"), default="./data"), make_option(c("-o", "--out_folder"), default="./out") )

opt_parser <- OptionParser(option_list = options) opt <- parse_args(opt_parser)

paste(opt$data_folder) paste(opt$out_folder)

with(run <- mlflow_start_run(), { accidents <- readRDS(file.path(opt$data_folder, "accidents.Rd")) summary(accidents)

saveRDS(accidents, file.path(opt$out_folder, "predictions.Rd")) })


Pipeline component json, subsequent step:

"runDefinition": { "script": null, "command": "Rscript eval_model.r --model $AZUREML_DATAREFERENCE_model", "useAbsolutePath": false, "arguments": [], "sourceDirectoryDataStore": null, "framework": "Python", "communicator": "None", "target": "cpu-cluster", "dataReferences": { "model": { "dataStoreName": "workspaceblobstore", "mode": "Mount", "pathOnDataStore": "azureml/{name}/model_out/", "pathOnCompute": null, "overwrite": true } }, "inputs" : null

steps to reproduce:

  • open attached
  • create environment from docker file
  • uncomment lines /src/accident.R ln:27 and ln:46
  • run az ml create job -f pipeline.yml

accidents-copy.zip l

Author: jakeatmsft
Assignees: -
Labels:

extension/ml, customer-reported, Machine Learning, Service Attention, Auto-Assign

Milestone: -

ghost avatar Oct 31 '22 08:10 ghost