azure-sdk-for-python
azure-sdk-for-python copied to clipboard
[Azure ML SDK v2] File is not written to output azureml datastore
- Package Name: azure.ai.ml
- Package Version: latest in Azure ML Notebooks (Standard)
- Operating System: Azure ML Notebooks (Standard)
- Python Version: Azure ML Notebooks (Standard)
Describe the bug
The Azure ML datastore tfconfigs
has multiple files in the base path.
For a pipeline job the Azure ML Datastore tfconfig
is defined as an output to write data:
update_config_component = command(
name="tf_config_update",
display_name="Tensorflow configuration file update",
description="Reads the pipeline configuration file from a specific model (directory), updates it with the params, and saves the new pipleine config file to the output directory",
inputs=dict(
config_dir=Input(type="uri_folder"),
config_directory_name=Input(type="string"),
images_dir=Input(type="uri_folder"),
labelmap_path=Input(type="string"),
fine_tune_checkpoint_type=Input(type="string"),
fine_tune_checkpoint=Input(type="string"),
train_record_path=Input(type="string"),
test_record_path=Input(type="string"),
num_classes=Input(type="integer"),
batch_size=Input(type="integer"),
num_steps=Input(type="integer"),
),
outputs = {
"config_directory_output": Output(
type=AssetTypes.URI_FOLDER,
path=f"azureml://subscriptions/{ml_client.subscription_id}/resourcegroups/{ml_client.resource_group_name}/workspaces/{ml_client.resource_group_name}/datastores/tfconfigs/paths/",
)
},
# The source folder of the component
code=update_config_src_dir,
command="""pwd && ls -la ${{outputs.config_directory_output}} && python update.py \
--config_dir ${{inputs.config_dir}} \
--config_directory_name ${{inputs.config_directory_name}} \
--config_directory_output ${{outputs.config_directory_output}} \
--images_dir ${{inputs.images_dir}} \
--labelmap_path ${{inputs.labelmap_path}} \
--fine_tune_checkpoint_type ${{inputs.fine_tune_checkpoint_type}} \
--fine_tune_checkpoint ${{inputs.fine_tune_checkpoint}} \
--train_record_path ${{inputs.train_record_path}} \
--test_record_path ${{inputs.test_record_path}} \
--num_classes ${{inputs.num_classes}} \
--batch_size ${{inputs.batch_size}} \
--num_steps ${{inputs.num_steps}} \
""",
environment="azureml://registries/azureml/environments/AzureML-minimal-ubuntu18.04-py37-cpu-inference/versions/43",
)
The output config_directory_output
is mounted the computing engine execution as follows:
/mnt/azureml/cr/j/6a153baacc664cada4060f0b95adbf0e/cap/data-capability/wd/config_directory_output
At the beginning of the python-script the output
directory is listed as follows:
print("Listing path / dir: ", args.config_directory_output)
arr = os.listdir(args.config_directory_output)
print(arr)
The directory does not include any files:
Listing path / dir: /mnt/azureml/cr/j/6a153baacc664cada4060f0b95adbf0e/cap/data-capability/wd/config_directory_output
[]
BUG: The Azure ML Datastore tfconfig
mounted as an output
includes multiple files already uploaded manually.
At the end of the python script a config-file is written to the mounted output
and the directiry is listed again as follows:
with open(pipeline_config_path, "r") as f:
config = f.read()
with open(new_pipeline_config_path, 'w') as f:
# Set labelmap path
config = re.sub('label_map_path: ".*?"',
'label_map_path: "{}"'.format(images_dir_labelmap_path), config)
# Set fine_tune_checkpoint path
config = re.sub('fine_tune_checkpoint_type: ".*?"',
'fine_tune_checkpoint_type: "{}"'.format(args.fine_tune_checkpoint_type), config)
# Set fine_tune_checkpoint path
config = re.sub('fine_tune_checkpoint: ".*?"',
'fine_tune_checkpoint: "{}"'.format(args.fine_tune_checkpoint), config)
# Set train tf-record file path
config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/train)(.*?")',
'input_path: "{}"'.format(images_dir_train_record_path), config)
# Set test tf-record file path
config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/val)(.*?")',
'input_path: "{}"'.format(images_dir_test_record_path), config)
# Set number of classes.
config = re.sub('num_classes: [0-9]+',
'num_classes: {}'.format(args.num_classes), config)
# Set batch size
config = re.sub('batch_size: [0-9]+',
'batch_size: {}'.format(args.batch_size), config)
# Set training steps
config = re.sub('num_steps: [0-9]+',
'num_steps: {}'.format(int(args.num_steps)), config)
f.write(config)
# List directory
print("Listing path / dir: ", args.config_directory_output)
arr = os.listdir(args.config_directory_output)
print(arr)
The listing directory of the mounted output
is as follows:
Listing path / dir: /mnt/azureml/cr/j/6a153baacc664cada4060f0b95adbf0e/cap/data-capability/wd/config_directory_output
['ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8_steps125000_batch16.config']
BUG: The mounted output
directory includes now a file. But the Azure ML Datastore does not include the new written file seen in the Azure Explorer / Azure Portal GUI.
To Reproduce Steps to reproduce the behavior:
- Create a new Azure ML Datastore with a new container the storage account
- Create pipeline with a job and the
output
is the new created Azure ML Datastore - Write a file to the
output
in a pipeline job - Run the pipeline
- Confirm that the file is not created in the Azure ML Datastore / Azure Storage Blob Container
Expected behavior
Any file written an output Azure ML Datastore
in a python-job should be written to the underlying Azure Storage Blob Container and can be used later.
Additional context Using the following tutorials as reference:
- https://github.com/Azure/azureml-examples/blob/main/sdk/python/assets/data/data.ipynb -> Reading and writing data in a job
Label prediction was below confidence level 0.6
for Model:ServiceLabels
: 'App Configuration:0.13678811,Storage:0.08488751,Compute:0.07487632'
@azureml-github
I have the same issue, doesn't matter what "path=" I provide in the outputs, it will always mount the output to azureml://datastores/${{default_datastore}}/paths/azureml/${{name}}/${{output_name}}/
This is not a supported scenario yet. We don't allow customized output path. azure-ai-ml prob should raise the right error message in this scenario.
@wangchao1230 What do you think of adding a validation in Output class's constructor?
Specifing output path during defining component will not work and still use default path azureml://datastores/${{default_datastore}}/paths/azureml/${{name}}/${{output_name}}/
However, specifing output path during component consumption in pipeline is supported with code like below:
# in a pipeline
node = component(<component-args>)
node.outputs.output = Output(
type="uri_folder", mode="rw_mount", path=custom_path
)
please refer to our sample on this.
Hmmm this didn't work for me, I followed the example:
# example how to change path of output on step level,
# please note if the output is promoted to pipeline level you need to change path in pipeline job level
score_with_sample_data.outputs.score_output = Output(
type="uri_folder", mode="rw_mount", path=custom_path
)
But my job when submitted shows the datastore for the output is still set to the workspaceblobstore:
In my case the output is a file and so I'm trying to do this:
datastore_path = f"azureml://subscriptions/{subscription}/resourcegroups/{rg}/workspaces/{ws_name}/datastores/nasfacemodels/paths/Deci1"
model_path = f"{datastore_path}/deci_optimized_1.onnx"
dlc_path = f"{datastore_path}/model.dlc"
quant_dlc_path = f"{datastore_path}/model.quant.dlc"
from azure.ai.ml import dsl, Input, Output
@dsl.pipeline(
compute=snpe_cluster,
description="Quantization pipeline",
)
def quantization_pipeline(
pipeline_job_data_input,
model_input
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input
)
# convert onnx to dlc
convert_job = convert_component(
model=model_input
)
# for the custom path to work we have to specify it again here,
# see https://github.com/Azure/azure-sdk-for-python/issues/27454
convert_job.outputs.dlc = Output(type="uri_file", path=dlc_path, mode="rw_mount")
# using train_func like a python call with its own inputs
quant_job = quant_component(
data=data_prep_job.outputs.quant_data,
list_file='input_list.txt',
model=convert_job.outputs.dlc
)
quant_job.outputs.quant_model = Output(type="uri_file", path=quant_dlc_path, mode="rw_mount")
# a pipeline returns a dictionary of outputs
# keys will code for the pipeline output identifier
return {
"pipeline_job_model": convert_job.outputs.dlc,
"pipeline_job_quant_model": quant_job.outputs.quant_model
}
pipeline = quantization_pipeline(
pipeline_job_data_input=Input(type="uri_file", path=face_data.path),
model_input=Input(type="uri_file", path=model_path)
)
The only thing that does work and comes from my custom blobstore is the model input Input(type="uri_file", path=face_data.path)
I wish this would work, if not, it looks like I'll have to create my own "save to blobstore" components and inject them into my pipeline which I'd rather not have to do...
Hi @lovettchris, root cause for this issue is when quant_model
got promoted as pipeline level output when returned in @dsl.pipeline. When a node level output got promoted as a pipeline level output. it's node level setting(path, mode) will be overwrite by pipeline level output setting. And when pipeline outputs not configured, our system will fill a default settings for it. So the node output setting got overwrite by the default settings.
We just implemented a fix in SDK side to copy the node output setting to pipeline level which can fix the issue. You can try install the following private build and check if it works:
pip install azure-ai-ml==1.5.0a20230215003 --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
Also, I noticed you used ArmID format of datastore:
datastore_path = f"azureml://subscriptions/{subscription}/resourcegroups/{rg}/workspaces/{ws_name}/datastores/nasfacemodels/paths/Deci1"
You may change this to
datastore_path = f"azureml://datastores/nasfacemodels/paths/Deci1"
as @zhengfeiwang suggested.
Very cool, thanks Han, Im testing out your fix.
It works, thanks and I no longer need to edit the outputs in the pipeline definition, instead the path in the original component output definition is enough:
convert_component = command(
name="convert",
display_name="Convert .onnx to .dlc",
description="Converts the onnx model to dlc format",
inputs={
"model": Input(type="uri_file")
},
outputs= {
"dlc": Output(type="uri_file", path=dlc_path, mode="rw_mount")
},
# The source folder of the component
code=scripts_dir,
command="""python3 convert.py \
--model ${{inputs.model}} \
--output ${{outputs.dlc}} \
""",
environment=f"{pipeline_job_env.name}:{pipeline_job_env.version}",
)
And this created the output:
which links back to my custom blob store (notice the last modified date on this file is today, which came from this pipeline execution).
Very cool. Now I can run jobs all day that "accumulate" the results I need in a bigger combined blobstore. I did notice one weird thing however, it also created this file in my blobstore, which is incorrect:
It is a zero bytes file, so not sure why it is there. Could this be some weird side effect or bug?
Thanks, by the way, regarding the path simplification you showed me:
datastore_path = f"azureml://datastores/nasfacemodels/paths/Deci1"
Originally I got stuck with the v2 api while trying to write something like this:
blobstore = ml_client.datastores.get(name='nasfacemodels') pipeline = quantization_pipeline( pipeline_job_data_input=Input(type="uri_file", path=face_data.path), model_input=Input(type="uri_file", path= blobstore.path / "models" / "Deci2") )
It would be cool if this "just worked", I think I can almost do it with this:
from pathlib import PurePosixPath
model_path = "azureml://" + (PurePosixPath(blobstore.id) / "paths" / "models" / "Deci1")
but that's too complicated. It would be nice if the "Input" class (and Output class) had a more directly discoverable connection to the datastore object.
Hi @lovettchris ,
Currently, this type of concat for path is not supported, we only support pain text, for such advance expression supported it is still in our backlog.
Thanks hopefully the api can be improved soon, I found this particularly hard to discover.
You have this api already: "blobstore = ml_client.datastores.get(name='nasfacemodels')"
Just make it usable in the Input and Output path would be great. Or better yet you could add a "store" parameter so I can do this:
Input(type="blob_store", store=ml_client.datastores.get(name='nasfacemodels'), path='models/Deci2')
Then it would be even more clear that you CAN create a connection between pipeline inputs and outputs and azure data stores...
I am also getting this 0KB ghost file being created following the same custom uri_folder output path provided in the pipeline output construction. Could this be fixed @D-W- / @cloga ?
Below a screenshot of what I am reffering to:
Hi @D-W- ,
I am still facing this issue in the azure-ai-ml 1.7.2
. Any Update on the permanent fix?
creating a pipeline with CLI-v2 and specifying a custom output path still doesn't work, the default output path is always used: azureml://datastores/${{default_datastore}}/paths/azureml/${{name}}/${{output_name}}/
Any plans for fixing this?
Never mind, there was a typo in the component output parameter, would be great if we could through an error in this case
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github @Azure/azure-ml-sdk.
My team is also still experiencing this. Was the temporary fix on a private build ever merged in as a public fix?
Hi @TheMellyBee the issue has been fixed and available in azure-ai-ml>1.5.0
.
Could you help post your issue here so we can check if it's same issue?
Here's a doc on how to set datastore for outputs for your reference: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-inputs-outputs-pipeline?view=azureml-api-2&tabs=cli
Hi @apthagowda97, @TomSB1423 and @lovettchris, sorry for the late response. Could you help create a new issue to track the 0KB ghost file thing and we will discuss there? Since it's runtime behavior and seems not related with control plane SDK. we'll involve runtime devs to help investigate. The original output setting issue should be fixed in azure-ai-ml>1.5.0
I'm having a very similar issue, except with batch endpoints. A week ago custom predictions outputs worked for me through the Azure Machine Learning Studio GUI. Now it's giving me an irrelevant error message and won't even let me run the job.
I have verified that the datastore is properly connected as I am able to browse it within the Data assets. It seems like custom outputs do not work unless the output is going to my default workspaceblobstore.
So I tried using my default datastore 'workspaceblobstore' and sure enough it ran but did not accept the custom output path... Here is the run overview, notice there is only Inputs, and no "Outputs" table. It simply defaulted that "azureml/<run_id>/score" path:
Here is what is odd, if I look into the run raw JSON, this is what it looks like... Notice, no "outputDatasets"
However, going back a week, when custom outputs magically worked, this is what the run raw JSON looked like:
Additionally, last week, in the same run, you can see that the run overview has an "Outputs" table:
I even went as far as to try using Python with the latest azure.ai.ml sdk to invoke the batch endpoint, but to no avail. The MLClient.batch_endpoints.invoke() method will certainly run given the input and output, but it will always output the predictions.csv to the workspaceblobstore "azureml/<run_id>/score" default path.
I believe this fixed itself or someone found the issue.