azure-sdk-for-python [Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account

Package Name: azure-ai-ml
Package Version: 1.0.0
Operating System: Windows Server 2022 Standard
Python Version: 3.9.13

Describe the bug According to the documentation it should be possible to access public blob storage containers using Input(type='uri_folder') instance. While passing actual path of the data, azure docs say that it is possible to use either https://<account_name>.blob.core.windows.net/<container_name>/<path> or abfss://<file_system>@<account_name>.dfs.core.windows.net/<path> path format

I tried to use the first option (https://) with diabetes dataset, which is available under the following link: https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes. However, this access method causes error like below:

{"NonCompliant":"DataAccessError(NotFound)"}
{
  "code": "data-capability.UriMountSession.PyFuseError",
  "target": "",
  "category": "UserError",
  "error_details": [
    {
      "key": "NonCompliantReason",
      "value": "DataAccessError(NotFound)"
    },
    {
      "key": "StackTrace",
      "value": "  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py\", line 70, in start\n    (data_path, sub_data_path) = session.start()\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/data_sessions.py\", line 364, in start\n    options=mnt_options\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 696, in rslex_uri_volume_mount\n    raise e\n\n  File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 690, in rslex_uri_volume_mount\n    mount_context = RslexDirectURIMountContext(mount_point, uri, options)\n"
    }
  ]
}


AzureMLCompute job failed.
data-capability.UriMountSession.PyFuseError: [REDACTED]
  Reason: [REDACTED]
  StackTrace:   File "/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py", line 70, in start
    (data

With the second option, i.e. wasbs://[email protected]/diabetes job finishes successfully

To Reproduce Steps to reproduce the behavior: Execute the following code:

ml_client = MLClient(...)

job = command(
    command="ls ${{inputs.diabetes}}",
    inputs={
        "diabetes": Input(
            type="uri_folder",
            path="https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes",
        )
    },
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="cpu-cluster",
    display_name="data_access_test",
    # description,
    experiment_name="data_access_test"
)

ml_client.create_or_update(job)

Expected behavior Job will complete successfully. User logs will show the list of files inside passed blob storage folder

Nov 05 '22 14:11 glebrh

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Storage:0.21718025,Azure.Core:0.16009027,Data Lake Storage Gen2:0.13490766'

Nov 05 '22 14:11 azure-sdk

BTW with uri_file option works normally with paths like this: https://azuremlexamples.blob.core.windows.net/datasets/iris.csv

Nov 05 '22 14:11 glebrh

@azureml-github

Nov 07 '22 17:11 xiangyan99

+1 I am facing this issue as well (can't use https:// as the path for the URI_FOLDER). I had to switch to using URI_FILE instead of URI_FOLDER as the initial input in my pipeline code as a workaround.

Nov 27 '22 05:11 AndrewRTsao

Thx for reporting this. We'll investigate and get back to you.

Nov 29 '22 18:11 luigiw

Hi, for uri_folder, please use wasbs schemed uri if its blob storage(wasbs://@.blob.core.windows.net/<path_to_data>/)

or abfs(abfss://@.dfs.core.windows.net/<path_to_data>/) if its adlsgen2 storage

Nov 29 '22 21:11 QianqianNie

Then the documentation should be updated, I guess? Wherever it is mentioned that access to uri_folder is possible via https protocol, it should be removed?

For instance here or here

Or eventually, support for https + uri_folder will be added?

Nov 29 '22 21:11 glebrh

I think this will be a document improvement.

Dec 16 '22 00:12 luigiw

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

Dec 23 '22 02:12 ghost

I am still facing this issue while submitting an Azure ML Job, using URI Folder with azure blob container storage as Input to the Job -> https://<account_name>.blob.core.windows.net/<container_name>/

if I created an Azure ML Data Asset or Azure ML Datastore for the same Azure blob container storage path, the job is starting without any problems.

Mar 29 '23 03:03 ylnhari

Hi all, I am facing issue when trying to read csv file stored on my github repository into azure ml. It throws following error:

Apr 20 '23 12:04 madhuyadu

I found this problem interesting. It seems that you have to register the datastore with the subscription and resource group where the data is located. There should be a streamlined solution for this.

Jun 24 '23 06:06 tahhnik

azure-sdk-for-python
azure-sdk-for-python copied to clipboard

[Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account_name> scheme

azure-sdk-for-python azure-sdk-for-python copied to clipboard

[Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account_name> scheme

azure-sdk-for-python
azure-sdk-for-python copied to clipboard