azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

[Azure ML SDK v2] Method to download Data asset locally

Open tomasvanpottelbergh opened this issue 1 year ago • 5 comments

The Data class in the Azure ML SDK v2 allows the uploading and creation of a new Data asset, but not its downloading. I understand that the idea is to not use the new SDK inside training jobs. However, for exploration purposes it is very handy to be able to download a registered Data asset, as is possible with the SDK v1.

Would it be possible to add this feature? Alternatively, is there a way (using other parts of the SDK?) to download assets with paths such as azureml://datastores/<data_store_name>/paths/<path>?


Document details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

tomasvanpottelbergh avatar Sep 14 '22 15:09 tomasvanpottelbergh

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Docs:0.5414302,Data Lake Storage Gen2:0.053330887,Tables:0.029985525'

azure-sdk avatar Sep 14 '22 15:09 azure-sdk

Thank you for your feedback. This has been routed to the support team for assistance.

ghost avatar Sep 14 '22 17:09 ghost

Hi @tomasvanpottelbergh, thanks for your feedback. @azureml-github can you take a look at this issue?

kristapratico avatar Sep 14 '22 17:09 kristapratico

I upvote this. It would be great to have a piece of Python SDK which will allow to download/upload the datasets using the code directly.

marrrcin avatar Sep 15 '22 09:09 marrrcin

Hi Thomas, about how to "download assets with paths such as azureml://datastores/<data_store_name>/paths/", AzureML is delivering FSSpec integration into Private Preview. With this integration, you will be able to use Pandas, Dask and other libraries that accept file objects to deal with paths such as "azureml://subscriptions/<subscription_name>/resourcegroups/<rg_name>/workspaces/<ws_name>/datastores/<ds_name>/paths/" If you are willing to try this experience out. You can install below private preview packages:

pip install --pre azure-ai-ml pip install amlfs --extra-index-url https://azuremlsdktestpypi.azureedge.net/Create-Dev-Index/69894043/

SturgeonMi avatar Sep 23 '22 21:09 SturgeonMi

Hi @SturgeonMi, I briefly tried this out, but couldn't get the authentication to work. Anyway, I found a workaround using azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri. This is definitely not a great solution, since it's a "private" API, but I hope that this functionality will get exposed publicly in the azure-ai-ml package at some point.

tomasvanpottelbergh avatar Oct 04 '22 14:10 tomasvanpottelbergh

+1 on exposing this functionality would be great. To expand on @tomasvanpottelbergh 's solution, I was able to download locally using the following:

import os

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import azure.ai.ml._artifacts._artifact_utilities as artifact_utils

subscription_id = ""
resource_group = ""
workspace = ""

dataset_name = ""
dataset_version = ""
downloaded_data_folder = "./data"

# Get the client
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# Lookup the dataset to get the 'path'
data_info = ml_client.data.get(name=dataset_name, version=dataset_version)

# Download the dataset
artifact_utils.download_artifact_from_aml_uri(uri = data_info.path, destination = downloaded_data_folder, datastore_operation=ml_client.datastores)

# Verify it is downloaded
file_path = os.path.basename(data_info.path[10:])
assert os.path.exists(os.path.join(downloaded_data_folder, file_path))

jomalsan avatar Oct 11 '22 16:10 jomalsan

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shivanissambare.

Issue Details

The Data class in the Azure ML SDK v2 allows the uploading and creation of a new Data asset, but not its downloading. I understand that the idea is to not use the new SDK inside training jobs. However, for exploration purposes it is very handy to be able to download a registered Data asset, as is possible with the SDK v1.

Would it be possible to add this feature? Alternatively, is there a way (using other parts of the SDK?) to download assets with paths such as azureml://datastores/<data_store_name>/paths/<path>?


Document details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Author: tomasvanpottelbergh
Assignees: bandsina
Labels:

feature-request, question, Service Attention, Client, customer-reported, needs-team-attention, ML-Inference

Milestone: -

ghost avatar Oct 13 '22 19:10 ghost

hello it seems method azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri doesn't work for Tabular type data assets.

The AzureML documentation does not provide a method to locally download tabular data assets. The "download" method (from the TabularDataset class) requires a "stream_column" parameter which allows to download the data related to the dataset but not the dataset itself.

What can you advise me ?

jpacifico avatar Dec 01 '22 11:12 jpacifico

Any update on this? We're looking forward to use Azure ML datasets with fsspec.

marrrcin avatar Dec 06 '22 08:12 marrrcin

Fsspec integration is in Public Preview now, please find corresponding documentation here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-access-data-interactive?tabs=adls

SturgeonMi avatar Dec 12 '22 18:12 SturgeonMi

Hi @jpacifico, would like to confirm with you: is that you want to download the Tabular data asset with its metadata and data? May I ask what is the use case here? Thanks! (I am asking this to make sure I understand the question correctly.)

SturgeonMi avatar Dec 12 '22 18:12 SturgeonMi

@tomasvanpottelbergh FYI

I think the 1.0.0 release of azure fsspec adds a download option to the AzureMachineLearningFileSystem. There is also something called upload but I have not tried it out.

fdroessler avatar May 08 '23 07:05 fdroessler

Hi @SturgeonMi, I briefly tried this out, but couldn't get the authentication to work. Anyway, I found a workaround using azure.ai.ml._artifacts._artifact_utilities.download_artifact_from_aml_uri. This is definitely not a great solution, since it's a "private" API, but I hope that this functionality will get exposed publicly in the azure-ai-ml package at some point.

My Uri looks like: azureml://registries/azureml-1p/data/data-name/versions/1 and I get the error:

ValidationException: Invalid AzureML datastore path URI azureml://registries/azureml-1p/data/data-name/versions/1

How do I download the data given that URI? Thanks!

andrescodas avatar Oct 24 '23 14:10 andrescodas