azure-sdk-for-python
azure-sdk-for-python copied to clipboard
azure.ai.ml: cannot use a code directory containing symlinks, even if symlinks are in an ignorefile
- Package Name: azure.ai.ml
- Package Version: 1.2.0
- Operating System: WSL, Ubuntu 20.04 LTS
- Python Version: Python 3.9.13
Describe the bug The problem is that _blob_storage_helper.py ignores ignore_file for get_directory_size, which calculates the size of the entire directory, in BlobStorageClient.upload (_blob_storage_helper.py).
This has two consequences;
- The size calculated by get_directory_size(source) in is incorrect as it includes files which will not be uploaded.
- If ignored files include a symlink with .amlignore and/or .gitignore, the script will crash.
Background; I'm packaging with tox and part of it's process is that it creates a python virtualenv during build which inevitably includes symlinks.
To Reproduce Steps to reproduce the behavior:
- Create a code folder with a script and a symlink
- Add that symlink to a .gitignore file so it shouldn't be uploaded
- Create a command pointing to that code folder.
- Crash
ROOT_DIRECTORY = <path_to_code_dir>
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Environment, BuildContext
from azure.ai.ml import command
credential = DefaultAzureCredential()
environment = Environment()
# Get a handle to the workspace
ml_client = MLClient.from_config(
credential=credential,
path="config.json"
)
command_job = command(
code=ROOT_DIRECTORY,
command="train.py",
environment=environment,
compute='local'
)
returned_job = ml_client.jobs.create_or_update(command_job)
Expected behavior Ignored files should not cause the script to fail. Size warnings should be accurate.
Additional context Full stack trace for an example
Traceback (most recent call last):
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 256, in _get_code_asset_arm_id
code_asset = self._code_assets.create_or_update(code_asset)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 141, in create_or_update
raise ex
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 94, in create_or_update
code, _ = _check_and_upload_path(
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 399, in _check_and_upload_path
uploaded_artifact = _upload_to_datastore(
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 297, in _upload_to_datastore
artifact = upload_artifact(
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 178, in upload_artifact
artifact_info = storage_client.upload(
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_blob_storage_helper.py", line 91, in upload
file_size, _ = get_directory_size(source)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_utils/_asset_utils.py", line 414, in get_directory_size
path_size = os.path.getsize(
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: 'python'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/debugpy/server/cli.py", line 444, in main
run()
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 288, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "./scripts/azure_train.py", line 52, in <module>
returned_job = ml_client.jobs.create_or_update(command_job)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 565, in create_or_update
raise ex
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 500, in create_or_update
self._resolve_arm_id_or_upload_dependencies(job)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 843, in _resolve_arm_id_or_upload_dependencies
self._resolve_arm_id_or_azureml_id(job, self._orchestrators.get_asset_arm_id)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 1047, in _resolve_arm_id_or_azureml_id
job = self._resolve_arm_id_for_command_job(job, resolver)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 1084, in _resolve_arm_id_for_command_job
job.code = resolver(
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 215, in get_asset_arm_id
result = self._get_code_asset_arm_id(asset, register_asset=register_asset)
File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 274, in _get_code_asset_arm_id
raise AssetException(
azure.ai.ml.exceptions.AssetException: Error with code: [Errno 2] No such file or directory: 'python'
Looks like this is a duplicate of https://github.com/Azure/azure-sdk-for-python/issues/27980
I'm not sure it is a duplicate. #27981 seems to refer to the handling of nested symlinks which the author would like to be uploaded, whereas this is a problem with the implementation of get_directory_size more generally not including the ignorefile argument.
Hi @ssabdb, thank you for opening an issue! I'll tag some folks who should be able to help, and we'll get back to you as soon as possible. @luigiw @azureml-github
@diondrapeck for awareness.
As another illustration of this issue, I receive a warning that my upload size is more than 100mb - when it's actually only 810kb.
I had the same problem in azure-ai-ml 1.0.0 when i execute my pipeline deployment through nox. , it works when i upgrade in 1.1.2 but now in 1.2.0+ i have this stack of error as well :
nox > python ./mlops/pipelines/training.py
Class JobDefinition: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Traceback (most recent call last):
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 256, in _get_code_asset_arm_id
code_asset = self._code_assets.create_or_update(code_asset)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 141, in create_or_update
raise ex
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 94, in create_or_update
code, _ = _check_and_upload_path(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 399, in _check_and_upload_path
uploaded_artifact = _upload_to_datastore(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 297, in _upload_to_datastore
artifact = upload_artifact(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 178, in upload_artifact
artifact_info = storage_client.upload(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_blob_storage_helper.py", line 91, in upload
file_size, _ = get_directory_size(source)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_utils/_asset_utils.py", line 414, in get_directory_size
path_size = os.path.getsize(
File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: 'python'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/./mlops/pipelines/training.py", line 173, in <module>
run_training_pipeline()
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/./mlops/pipelines/training.py", line 156, in run_training_pipeline
ml_client.batch_deployments.begin_create_or_update(deployment).result()
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_batch_deployment_operations.py", line 115, in begin_create_or_update
self._validate_component(deployment, orchestrators)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_batch_deployment_operations.py", line 258, in _validate_component
component = self._all_operations.all_operations[AzureMLResourceType.COMPONENT].create_or_update(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 301, in create_or_update
self._resolve_arm_id_or_upload_dependencies(component)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 451, in _resolve_arm_id_or_upload_dependencies
self._resolve_arm_id_and_inputs(component)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 542, in _resolve_arm_id_and_inputs
self._resolve_arm_id_for_pipeline_component_jobs(component.jobs, self._orchestrators.get_asset_arm_id)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 526, in _resolve_arm_id_for_pipeline_component_jobs
resolve_base_node(key, job_instance)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 514, in resolve_base_node
node._component = resolver(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 223, in get_asset_arm_id
result = self._get_component_arm_id(asset)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 349, in _get_component_arm_id
component._id = self._component.create_or_update(
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 301, in create_or_update
self._resolve_arm_id_or_upload_dependencies(component)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 435, in _resolve_arm_id_or_upload_dependencies
_try_resolve_code_for_component(component=component, get_arm_id_and_fill_back=get_arm_id_and_fill_back)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 636, in _try_resolve_code_for_component
component.code = get_arm_id_and_fill_back(code, azureml_type=AzureMLResourceType.CODE)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 215, in get_asset_arm_id
result = self._get_code_asset_arm_id(asset, register_asset=register_asset)
File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 274, in _get_code_asset_arm_id
raise AssetException(
azure.ai.ml.exceptions.AssetException: Error with code: [Errno 2] No such file or directory: 'python'
nox > Command python ./mlops/pipelines/training.py failed with exit code 1
nox > Session cd failed.
Error: Process completed with exit code 1.
As another illustration of this issue, I receive a warning that my upload size is more than 100mb - when it's actually only 810kb.
This seems related to an ignore file bug that was found recently. We have a fix merged into main that should resolve it in the next release.
As for symlinks not working at all, even when an ignorefile is not used, this is my first time hearing that. I'll look to reproduce.
Looking into this - I verified that symlinks break simple command jobs, regardless of any ignore files targeting them. I'll start poking around the relevant code tomorrow.
Looks like this is now resolved! Thanks all.