Use public direct links for object for S3 and Azure
Followup https://github.com/iterative/datachain/pull/755
Less critical of implementation since it affects only public, no credentials buckets and Studio teams. It works already for Google Storage since @dreadatour fixed it a while ago.
Public S3 and Azure client.url() code. Similar to GS that already has a check for anon in it - we need to generate and return direct URL to the cloud storage.
Make sure along the way:
- Endpoint URLs are supported, especially for AWS
- On the Studio side pass
msheader to signed URL to get a public URL that actually works (see some SO discussions) - add tests
Quick note: I have checked AWS S3 and it returns public URL out of the box if no credentials found:
In [1]: from datachain.catalog import get_catalog
In [2]: catalog = get_catalog()
In [3]: catalog.signed_url('s3://fast-ai-nlp', 'ag_news_csv.tgz')
Out[3]: 'https://fast-ai-nlp.s3.amazonaws.com/ag_news_csv.tgz'
This URL is actually works: https://fast-ai-nlp.s3.amazonaws.com/ag_news_csv.tgz
We still need to check all possible options for S3 and Azure.
Thing to check for S3 if it works for versioned files (when you pass version_id)