azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

azure.ai.ml: cannot use a code directory containing symlinks, even if symlinks are in an ignorefile

Open ssabdb opened this issue 2 years ago • 5 comments

  • Package Name: azure.ai.ml
  • Package Version: 1.2.0
  • Operating System: WSL, Ubuntu 20.04 LTS
  • Python Version: Python 3.9.13

Describe the bug The problem is that _blob_storage_helper.py ignores ignore_file for get_directory_size, which calculates the size of the entire directory, in BlobStorageClient.upload (_blob_storage_helper.py).

This has two consequences;

  1. The size calculated by get_directory_size(source) in is incorrect as it includes files which will not be uploaded.
  2. If ignored files include a symlink with .amlignore and/or .gitignore, the script will crash.

Background; I'm packaging with tox and part of it's process is that it creates a python virtualenv during build which inevitably includes symlinks.

To Reproduce Steps to reproduce the behavior:

  1. Create a code folder with a script and a symlink
  2. Add that symlink to a .gitignore file so it shouldn't be uploaded
  3. Create a command pointing to that code folder.
  4. Crash
ROOT_DIRECTORY = <path_to_code_dir>

from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Environment, BuildContext
from azure.ai.ml import command

credential = DefaultAzureCredential()
environment = Environment()

# Get a handle to the workspace
ml_client = MLClient.from_config(
    credential=credential,
    path="config.json"
)

command_job = command(
    code=ROOT_DIRECTORY,
    command="train.py",
    environment=environment,
    compute='local'
)

returned_job = ml_client.jobs.create_or_update(command_job)

Expected behavior Ignored files should not cause the script to fail. Size warnings should be accurate.

Additional context Full stack trace for an example

Traceback (most recent call last):
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 256, in _get_code_asset_arm_id
    code_asset = self._code_assets.create_or_update(code_asset)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 141, in create_or_update
    raise ex
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 94, in create_or_update
    code, _ = _check_and_upload_path(
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 399, in _check_and_upload_path
    uploaded_artifact = _upload_to_datastore(
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 297, in _upload_to_datastore
    artifact = upload_artifact(
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 178, in upload_artifact
    artifact_info = storage_client.upload(
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_blob_storage_helper.py", line 91, in upload
    file_size, _ = get_directory_size(source)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/_utils/_asset_utils.py", line 414, in get_directory_size
    path_size = os.path.getsize(
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: 'python'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/debugpy/server/cli.py", line 444, in main
    run()
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 288, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "./scripts/azure_train.py", line 52, in <module>
    returned_job = ml_client.jobs.create_or_update(command_job)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 565, in create_or_update
    raise ex
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 500, in create_or_update
    self._resolve_arm_id_or_upload_dependencies(job)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 843, in _resolve_arm_id_or_upload_dependencies
    self._resolve_arm_id_or_azureml_id(job, self._orchestrators.get_asset_arm_id)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 1047, in _resolve_arm_id_or_azureml_id
    job = self._resolve_arm_id_for_command_job(job, resolver)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_job_operations.py", line 1084, in _resolve_arm_id_for_command_job
    job.code = resolver(
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 215, in get_asset_arm_id
    result = self._get_code_asset_arm_id(asset, register_asset=register_asset)
  File "/home/matt/miniconda3/envs/faraday/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 274, in _get_code_asset_arm_id
    raise AssetException(
azure.ai.ml.exceptions.AssetException: Error with code: [Errno 2] No such file or directory: 'python'

ssabdb avatar Dec 22 '22 14:12 ssabdb

Looks like this is a duplicate of https://github.com/Azure/azure-sdk-for-python/issues/27980

ssabdb avatar Dec 22 '22 14:12 ssabdb

I'm not sure it is a duplicate. #27981 seems to refer to the handling of nested symlinks which the author would like to be uploaded, whereas this is a problem with the implementation of get_directory_size more generally not including the ignorefile argument.

ssabdb avatar Dec 22 '22 14:12 ssabdb

Hi @ssabdb, thank you for opening an issue! I'll tag some folks who should be able to help, and we'll get back to you as soon as possible. @luigiw @azureml-github

mccoyp avatar Dec 27 '22 19:12 mccoyp

@diondrapeck for awareness.

luigiw avatar Dec 27 '22 19:12 luigiw

image

As another illustration of this issue, I receive a warning that my upload size is more than 100mb - when it's actually only 810kb.

ssabdb avatar Dec 29 '22 12:12 ssabdb

I had the same problem in azure-ai-ml 1.0.0 when i execute my pipeline deployment through nox. , it works when i upgrade in 1.1.2 but now in 1.2.0+ i have this stack of error as well :

nox > python ./mlops/pipelines/training.py
Class JobDefinition: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Traceback (most recent call last):
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 256, in _get_code_asset_arm_id
    code_asset = self._code_assets.create_or_update(code_asset)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 141, in create_or_update
    raise ex
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_code_operations.py", line 94, in create_or_update
    code, _ = _check_and_upload_path(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 399, in _check_and_upload_path
    uploaded_artifact = _upload_to_datastore(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 297, in _upload_to_datastore
    artifact = upload_artifact(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_artifact_utilities.py", line 178, in upload_artifact
    artifact_info = storage_client.upload(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_artifacts/_blob_storage_helper.py", line 91, in upload
    file_size, _ = get_directory_size(source)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/_utils/_asset_utils.py", line 414, in get_directory_size
    path_size = os.path.getsize(
  File "/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: 'python'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/./mlops/pipelines/training.py", line 173, in <module>
    run_training_pipeline()
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/./mlops/pipelines/training.py", line 156, in run_training_pipeline
    ml_client.batch_deployments.begin_create_or_update(deployment).result()
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_batch_deployment_operations.py", line 115, in begin_create_or_update
    self._validate_component(deployment, orchestrators)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_batch_deployment_operations.py", line 258, in _validate_component
    component = self._all_operations.all_operations[AzureMLResourceType.COMPONENT].create_or_update(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 301, in create_or_update
    self._resolve_arm_id_or_upload_dependencies(component)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 451, in _resolve_arm_id_or_upload_dependencies
    self._resolve_arm_id_and_inputs(component)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 542, in _resolve_arm_id_and_inputs
    self._resolve_arm_id_for_pipeline_component_jobs(component.jobs, self._orchestrators.get_asset_arm_id)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 526, in _resolve_arm_id_for_pipeline_component_jobs
    resolve_base_node(key, job_instance)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 514, in resolve_base_node
    node._component = resolver(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 223, in get_asset_arm_id
    result = self._get_component_arm_id(asset)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 349, in _get_component_arm_id
    component._id = self._component.create_or_update(
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 301, in create_or_update
    self._resolve_arm_id_or_upload_dependencies(component)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 435, in _resolve_arm_id_or_upload_dependencies
    _try_resolve_code_for_component(component=component, get_arm_id_and_fill_back=get_arm_id_and_fill_back)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_component_operations.py", line 636, in _try_resolve_code_for_component
    component.code = get_arm_id_and_fill_back(code, azureml_type=AzureMLResourceType.CODE)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 215, in get_asset_arm_id
    result = self._get_code_asset_arm_id(asset, register_asset=register_asset)
  File "/actions-runner/gherunner/_work/pilot-simple-ml-classification-fec-ml-azure/pilot-simple-ml-classification-fec-ml-azure/.nox/cd/lib/python3.9/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 274, in _get_code_asset_arm_id
    raise AssetException(
azure.ai.ml.exceptions.AssetException: Error with code: [Errno 2] No such file or directory: 'python'
nox > Command python ./mlops/pipelines/training.py failed with exit code 1
nox > Session cd failed.
Error: Process completed with exit code 1.

PaulMaksud avatar Jan 20 '23 10:01 PaulMaksud

image

As another illustration of this issue, I receive a warning that my upload size is more than 100mb - when it's actually only 810kb.

This seems related to an ignore file bug that was found recently. We have a fix merged into main that should resolve it in the next release.

As for symlinks not working at all, even when an ignorefile is not used, this is my first time hearing that. I'll look to reproduce.

diondrapeck avatar Jan 23 '23 21:01 diondrapeck

Looking into this - I verified that symlinks break simple command jobs, regardless of any ignore files targeting them. I'll start poking around the relevant code tomorrow.

MilesHolland avatar Jan 25 '23 02:01 MilesHolland

Looks like this is now resolved! Thanks all.

ssabdb avatar Mar 29 '23 12:03 ssabdb