MachineLearningNotebooks icon indicating copy to clipboard operation
MachineLearningNotebooks copied to clipboard

"datastore.upload_files" is deprecated after version 1.0.69

Open urasandesu opened this issue 3 years ago • 6 comments

In azureml-core 1.37.0.post1, we have gotten the warning message like the below:

"datastore.upload_files" is deprecated after version 1.0.69. Please use "FileDatasetFactory.upload_directory" instead. See Dataset API change notice at https://aka.ms/dataset-deprecation.

It seems that the URL points to this page, but there is no information for migration.

datastore.upload_files could specify files as a list explicitly even if the directory contains the file we don't want to upload like credential data, the file which contains personal data before processing, and so on.

How to use FileDatasetFactory.upload_directory as same as datastore.upload_files?

urasandesu avatar Jan 23 '22 01:01 urasandesu

I tried some codes and my understanding is the following.

If the code uses datastore.upload_files as the below... :

file = './train.csv'
now = datetime.now(timezone('UTC'))
target_path = 'UI/' + now.strftime('%m-%d-%Y_%H%M%S_UTC')

default_datastore.upload_files([file], target_path=target_path, overwrite=True)

Then use FileDatasetFactory.upload_directory instead as the below :

file = './train.csv'
now = datetime.now(timezone('UTC'))
target_path = 'UI/' + now.strftime('%m-%d-%Y_%H%M%S_UTC')

Dataset.File.upload_directory('./', (default_datastore, target_path), pattern=file, overwrite=True)
# NOTE: In the parameter `pattern`, it seems that the string current directory indicates('./') is mandatory.

Is this correct?

urasandesu avatar Jan 23 '22 02:01 urasandesu

+1, would be curious how this is implemented.

thomassantosh avatar Jan 25 '22 06:01 thomassantosh

+1, having the same warning. Looking forward to seeing a replacement solution.

chengyu-liu-cs avatar Feb 22 '22 10:02 chengyu-liu-cs

Use Dataset.File.upload_directory, documented here. Here is a full example:

# configure Azure storage
ws = Workspace.from_config()
dstore = ws.datastores.get('your datastore')
dstore_path = 'relative datastore path'
target = (dstore,dstore_path)

# write to Azure storage
with tempfile.TemporaryDirectory() as tmpdir:
    df.to_parquet(f'{tmpdir}/df.parquet')
    ds=Dataset.File.upload_directory(tmpdir,target,overwrite=True)

maciejskorski avatar Mar 14 '22 17:03 maciejskorski

Super helpful example code. Just what I needed for how to specify a target folder in the datastore. Thank you.

RWilsker avatar Jun 13 '23 16:06 RWilsker

+1 to this question.

Still having issues with many of the solutions posted above, currently getting the following error with azureml-dataprep == 5.1.6 installed. I've tried going back to azureml-dataprep == 5.1.0 but still face the same error. If I try to roll back the package any further I run into compatibility issues with my installations of azureml-fsspec == 1.3.1 and mltable == 1.6.1.

NotImplementedError: _path_to_get_files_block is no longer supported. 
Deprecated, downgrade to a previous version of azureml-dataprep.

mhaythornthwaite avatar Jun 04 '24 14:06 mhaythornthwaite