azure-sdk-for-python
azure-sdk-for-python copied to clipboard
[Azure ML SDK v2] Method to download Data asset locally
The Data
class in the Azure ML SDK v2 allows the uploading and creation of a new Data asset, but not its downloading. I understand that the idea is to not use the new SDK inside training jobs. However, for exploration purposes it is very handy to be able to download a registered Data asset, as is possible with the SDK v1.
Would it be possible to add this feature? Alternatively, is there a way (using other parts of the SDK?) to download assets with paths such as azureml://datastores/<data_store_name>/paths/<path>
?
Document details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: 112731d7-568e-4310-ad10-687f70e47a86
- Version Independent ID: 5ee94e1d-c95d-6def-dfbb-f31979170ed5
- Content: azure.ai.ml.entities.Data class
- Content Source: preview/docs-ref-autogen/azure-ai-ml/azure.ai.ml.entities.Data.yml
- GitHub Login: @VSC-Service-Account
Label prediction was below confidence level 0.6
for Model:ServiceLabels
: 'Docs:0.5414302,Data Lake Storage Gen2:0.053330887,Tables:0.029985525'
Thank you for your feedback. This has been routed to the support team for assistance.
Hi @tomasvanpottelbergh, thanks for your feedback. @azureml-github can you take a look at this issue?
I upvote this. It would be great to have a piece of Python SDK which will allow to download/upload the datasets using the code directly.
Hi Thomas, about how to "download assets with paths such as azureml://datastores/<data_store_name>/paths/
pip install --pre azure-ai-ml pip install amlfs --extra-index-url https://azuremlsdktestpypi.azureedge.net/Create-Dev-Index/69894043/
Hi @SturgeonMi, I briefly tried this out, but couldn't get the authentication to work. Anyway, I found a workaround using azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri
. This is definitely not a great solution, since it's a "private" API, but I hope that this functionality will get exposed publicly in the azure-ai-ml package at some point.
+1 on exposing this functionality would be great. To expand on @tomasvanpottelbergh 's solution, I was able to download locally using the following:
import os
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import azure.ai.ml._artifacts._artifact_utilities as artifact_utils
subscription_id = ""
resource_group = ""
workspace = ""
dataset_name = ""
dataset_version = ""
downloaded_data_folder = "./data"
# Get the client
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
# Lookup the dataset to get the 'path'
data_info = ml_client.data.get(name=dataset_name, version=dataset_version)
# Download the dataset
artifact_utils.download_artifact_from_aml_uri(uri = data_info.path, destination = downloaded_data_folder, datastore_operation=ml_client.datastores)
# Verify it is downloaded
file_path = os.path.basename(data_info.path[10:])
assert os.path.exists(os.path.join(downloaded_data_folder, file_path))
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shivanissambare.
Issue Details
The Data
class in the Azure ML SDK v2 allows the uploading and creation of a new Data asset, but not its downloading. I understand that the idea is to not use the new SDK inside training jobs. However, for exploration purposes it is very handy to be able to download a registered Data asset, as is possible with the SDK v1.
Would it be possible to add this feature? Alternatively, is there a way (using other parts of the SDK?) to download assets with paths such as azureml://datastores/<data_store_name>/paths/<path>
?
Document details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: 112731d7-568e-4310-ad10-687f70e47a86
- Version Independent ID: 5ee94e1d-c95d-6def-dfbb-f31979170ed5
- Content: azure.ai.ml.entities.Data class
- Content Source: preview/docs-ref-autogen/azure-ai-ml/azure.ai.ml.entities.Data.yml
- GitHub Login: @VSC-Service-Account
Author: | tomasvanpottelbergh |
---|---|
Assignees: | bandsina |
Labels: |
|
Milestone: | - |
hello it seems method azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri
doesn't work for Tabular type data assets.
The AzureML documentation does not provide a method to locally download tabular data assets. The "download" method (from the TabularDataset class) requires a "stream_column" parameter which allows to download the data related to the dataset but not the dataset itself.
What can you advise me ?
Any update on this? We're looking forward to use Azure ML datasets with fsspec.
Fsspec integration is in Public Preview now, please find corresponding documentation here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-access-data-interactive?tabs=adls
Hi @jpacifico, would like to confirm with you: is that you want to download the Tabular data asset with its metadata and data? May I ask what is the use case here? Thanks! (I am asking this to make sure I understand the question correctly.)
@tomasvanpottelbergh FYI
I think the 1.0.0 release of azure fsspec adds a download option to the AzureMachineLearningFileSystem. There is also something called upload but I have not tried it out.
Hi @SturgeonMi, I briefly tried this out, but couldn't get the authentication to work. Anyway, I found a workaround using
azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri
. This is definitely not a great solution, since it's a "private" API, but I hope that this functionality will get exposed publicly in the azure-ai-ml package at some point.
My Uri looks like: azureml://registries/azureml-1p/data/data-name/versions/1 and I get the error:
ValidationException: Invalid AzureML datastore path URI azureml://registries/azureml-1p/data/data-name/versions/1
How do I download the data given that URI? Thanks!