azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

AzureML SDK download_file slow for big files.

Open egborbe opened this issue 1 year ago • 17 comments

I am using azure ml SDK to download outputs of AML jobs. I am accessing job runs via azureml.core.Run class. I am suffering from extremly slow and timed out downloads. However I have neither access to the blob URI nor to the parameters of BlobServiceClient. Please help me how to setup Run.download_file so that I can control chunk size, timeout, etc or how can I get the blob object URI in a supported way.

egborbe avatar Feb 02 '24 14:02 egborbe

Hi @egborbe, thanks for the feedback - @azureml-github will get back to you asap.

l0lawrence avatar Feb 02 '24 18:02 l0lawrence

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

github-actions[bot] avatar Feb 02 '24 18:02 github-actions[bot]

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

github-actions[bot] avatar Feb 05 '24 14:02 github-actions[bot]

Could you share the SDK version details for us to engage?

banibrata-de avatar Feb 06 '24 19:02 banibrata-de

Thanks for coming back to me. I use python3.10.7 with poetry on Ubuntu WSL. azureml-core = "^1.47.0" thats the onlything I have in my poetry setup file (pyproject.toml)

egborbe avatar Feb 07 '24 12:02 egborbe

Hi @egborbe , Could you please share code which you are using to consume azureml.core.Run download?

isaudagar avatar Feb 09 '24 12:02 isaudagar

Sure!

egborbe avatar Feb 09 '24 12:02 egborbe

    workspace = Workspace(
        run_config.subscription_id,
        run_config.resource_group,
        run_config.workspace
    )


    for exp_name in run_config.pred_experiments:

        exp = Experiment(workspace, exp_name)
        test_tmp = os.path.join(tmp_root, exp_name)
        os.makedirs(test_tmp)
        

        for pred_run in exp.get_runs():
            status = pred_run.get_status()
            details = pred_run.get_details()
            run_id = details['runId']
            if status != "Completed":
                print(f"Ignoring run {run_id}, status: {status}")
                continue
            
            
            metrics = pred_run.get_metrics()
            tags = pred_run.get_tags()
            filenames = set(pred_run.get_file_names())
            run_definition =  pred_run.get_details()["runDefinition"]
            
            ckp_num, aml_train_id = cls.parse_run_definition( run_definition)
            
            if 'checkpoint_number' in metrics.keys():
                ckp_num = metrics['checkpoint_number']
            
            if ckp_num is None:
                raise RuntimeError(f"Could not find checkpoint number for run {details['runId']}")
            
            if 'aml_run_id' in metrics.keys():
                aml_train_id = metrics['aml_run_id']
            elif 'training_run_id' in tags.keys():
                aml_train_id = tags['training_run_id']
            
            if aml_train_id is None:
                raise RuntimeError(f"Could not find training run id for run {details['runId']}")
            
            # sometimes train id is a list of identical elements, maybe a bug in AML?
            if type(aml_train_id) in [list, tuple]:
                aml_train_id = aml_train_id[0]
                            
            if isinstance(ckp_num, list):
                ckp_num = int(ckp_num[0])
            else:
                ckp_num = int(ckp_num)
                
            aml_train_ids[ckp_num] = aml_train_id
            
            checkpoints[exp_name].append(ckp_num)
            target_dir = os.path.join(test_tmp, str(ckp_num), "outputs")
            if os.path.exists(target_dir):
                print(f"Ignoring run {details['runId']}, already downloaded this epoch")
                continue
            os.makedirs(target_dir)
            
			# files I need
			pattern = re.compile(r'^outputs/.*(?:bla|gaga||true).*\.npy$', re.IGNORECASE)
            for filename in filenames:
                if pattern.match(filename):
                     pred_run.download_file(filename, target_dir)`

egborbe avatar Feb 09 '24 12:02 egborbe

Actually the best solution for me would be if I could just obtain the Azure Blob Storage URI of the Run files and do the whole download code by myself, using requests or the blob storage client.

egborbe avatar Feb 09 '24 14:02 egborbe

HI @egborbe , Hope below approach would be reach to your requirement to get path. image You can get url from this image

isaudagar avatar Feb 13 '24 13:02 isaudagar

Hi, thanks for the answer but I need the URI from python code and not the GUI. As this is an automated report generation tool and I cannot have the users look up the URI references manually for each run.

egborbe avatar Feb 14 '24 14:02 egborbe

Hi @egborbe, We can form string uri using below code. from azureml.core import Workspace
workspace_name = 'iqbal-test-download-ws' # specify workspace here datastore_name = 'workspaceartifactstore' #specify datastore here # init workspace ws = Workspace( subscription_id="b17253fa-f327-42d6-9686-f3e553e24763",# specify subscription_id here resource_group="v-isaudagar-download-rg",# specify resource_group here workspace_name=workspace_name ) #Storage URI strUri = f'{ws.datastores[datastore_name].protocol}://{ws.datastores[datastore_name].account_name}.{ws.datastores[datastore_name].datastore_type.replace("Azure", "")}.{ws.datastores[datastore_name].endpoint}.{ws.datastores[datastore_name].container_name}'

image

isaudagar avatar Feb 15 '24 14:02 isaudagar

Hi @egborbe. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Feb 21 '24 09:02 github-actions[bot]

Hi! What is the question? Thank you for your recommendation, however I havent had the opportunity to try it out as this issue has dropped in priority internally. Thanks for your help, again.

egborbe avatar Feb 21 '24 09:02 egborbe

HI @egborbe, Hope you are good with provided approach. could you please confirm?

isaudagar avatar Feb 28 '24 18:02 isaudagar

Hi @egborbe. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Mar 01 '24 10:03 github-actions[bot]

Hi @egborbe, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

github-actions[bot] avatar Mar 08 '24 15:03 github-actions[bot]