clearml icon indicating copy to clipboard operation
clearml copied to clipboard

StorageManager incorrectly identifies the bucket as additional folder for MinIO

Open idantene opened this issue 2 years ago • 2 comments

More details in this Slack thead.

When using MinIO, the current StorageManager apparently parses the host as the bucket, and then incorrectly creates a folder structure that contains the bucket name when downloading a folder (or file). For example, StorageManager.download_folder(remote_url='s3://some_ip:9000/clearml/some/folder', local_folder='./'), will create the following structure: ./clearml/some/folder, which contrasts with the documentation:

If we have a remote file s3://bucket/sub/file.ext then StorageManager.download_folder(‘s3://bucket/’, ‘~/folder/’) will create ~/folder/sub/file.ext

The expected behaviour is that the folder structure is replicated without the bucket name.

As an additional aside, I've experimented with the configuration file and StorageHelper, and found that explicitly setting the bucket name in aws.s3.credentials successfuly generates:

In [4]: StorageHelper._s3_configurations.get_config_by_uri('s3://some_ip:9000/clearml/my/path/of/interest/')
Out[4]: S3BucketConfig(bucket='clearml', host='some_ip:9000', key='xxx', secret='xxx', token='', multipart=False, acl='', secure=False, region='', verify=True, use_credentials_chain=False)

So the bucket name can be extracted as is. That could alleviate some of the coding issues by asking the user to explicitly state the buckets that are not AWS native.

idantene avatar Jul 06 '22 12:07 idantene

Hi @idantene, what you're describing is actually the expected current behavior, we'll update regarding any changes 🙂

jkhenning avatar Jul 10 '22 16:07 jkhenning

Hey @jkhenning,

I understand this is perhaps the current expected behaviour from the code (since the code only considers the bucket as "whatever comes after the s3:// schema", hence failing on MinIO), but the documentation reads otherwise I believe.

idantene avatar Jul 11 '22 07:07 idantene

Hey @idantene! v1.10.0 is now out. StorageManager.download_file() and StorageManager.download_folder() from MinIO and Azure no longer include bucket name in target local path

pollfly avatar Apr 04 '23 11:04 pollfly

@idantene closing this, please reopen if required.

jkhenning avatar Aug 06 '23 07:08 jkhenning