azure-sdk-for-python
azure-sdk-for-python copied to clipboard
[Azure ML SDK v2] Issue while reading data from `uri_folder` Input type via https://<account_name> scheme
- Package Name: azure-ai-ml
- Package Version: 1.0.0
- Operating System: Windows Server 2022 Standard
- Python Version: 3.9.13
Describe the bug
According to the documentation it should be possible to access public blob storage containers using Input(type='uri_folder')
instance. While passing actual path of the data, azure docs say that it is possible to use either
https://<account_name>.blob.core.windows.net/<container_name>/<path>
or
abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>
path format
I tried to use the first option (https://) with diabetes dataset, which is available under the following link: https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes
. However, this access method causes error like below:
{"NonCompliant":"DataAccessError(NotFound)"}
{
"code": "data-capability.UriMountSession.PyFuseError",
"target": "",
"category": "UserError",
"error_details": [
{
"key": "NonCompliantReason",
"value": "DataAccessError(NotFound)"
},
{
"key": "StackTrace",
"value": " File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py\", line 70, in start\n (data_path, sub_data_path) = session.start()\n\n File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/data_sessions.py\", line 364, in start\n options=mnt_options\n\n File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 696, in rslex_uri_volume_mount\n raise e\n\n File \"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\", line 690, in rslex_uri_volume_mount\n mount_context = RslexDirectURIMountContext(mount_point, uri, options)\n"
}
]
}
AzureMLCompute job failed.
data-capability.UriMountSession.PyFuseError: [REDACTED]
Reason: [REDACTED]
StackTrace: File "/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py", line 70, in start
(data
With the second option, i.e. wasbs://[email protected]/diabetes
job finishes successfully
To Reproduce Steps to reproduce the behavior: Execute the following code:
ml_client = MLClient(...)
job = command(
command="ls ${{inputs.diabetes}}",
inputs={
"diabetes": Input(
type="uri_folder",
path="https://azureopendatastorage.blob.core.windows.net/mlsamples/diabetes",
)
},
environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
compute="cpu-cluster",
display_name="data_access_test",
# description,
experiment_name="data_access_test"
)
ml_client.create_or_update(job)
Expected behavior Job will complete successfully. User logs will show the list of files inside passed blob storage folder
Label prediction was below confidence level 0.6
for Model:ServiceLabels
: 'Storage:0.21718025,Azure.Core:0.16009027,Data Lake Storage Gen2:0.13490766'
BTW with uri_file
option works normally with paths like this: https://azuremlexamples.blob.core.windows.net/datasets/iris.csv
@azureml-github
+1 I am facing this issue as well (can't use https://
as the path for the URI_FOLDER
). I had to switch to using URI_FILE
instead of URI_FOLDER
as the initial input in my pipeline code as a workaround.
Thx for reporting this. We'll investigate and get back to you.
Hi, for uri_folder, please use wasbs schemed uri if its blob storage(wasbs://
or abfs(abfss://
Then the documentation should be updated, I guess?
Wherever it is mentioned that access to uri_folder
is possible via https protocol, it should be removed?
Or eventually, support for https + uri_folder
will be added?
I think this will be a document improvement.
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
I am still facing this issue while submitting an Azure ML Job, using URI Folder with azure blob container storage as Input to the Job -> https://<account_name>.blob.core.windows.net/<container_name>/
if I created an Azure ML Data Asset or Azure ML Datastore for the same Azure blob container storage path, the job is starting without any problems.
Hi all,
I am facing issue when trying to read csv file stored on my github repository into azure ml. It throws following error:
I found this problem interesting. It seems that you have to register the datastore with the subscription and resource group where the data is located. There should be a streamlined solution for this.