model icon indicating copy to clipboard operation
model copied to clipboard

S3 dataset access

Open robmarkcole opened this issue 1 year ago • 2 comments

Hi I understand the dataset can be streamed from S3, following the example in the docs I get an error, and assume access must be granted?

 > aws s3 ls s3://clay-tiles-02/02/27WXN/

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

robmarkcole avatar May 02 '24 13:05 robmarkcole

Hi Rob!! I think the right move here is to copy a representative sample of embeddings to source.coop

I don't think if it makes sense to publicly host a copy of the whole training set publicly, when is just a cropped selection of data already available. E.g. on v1 we have 50M chips and we are anyways moving towards streaming from source COGs into the GPUs on training. https://github.com/Clay-foundation/stacchip

In the meantime I've just activated requester pays on this bucket.

brunosan avatar May 07 '24 14:05 brunosan

@brunosan I get an error:

⚡ ~/Clay-Foundation-Model aws s3 ls s3://clay-tiles-02/02/27WXN/ --request-payer requester

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

robmarkcole avatar May 08 '24 11:05 robmarkcole

@robmarkcole for Clay v1 we do not recommend using these datacubes anymore. The input ca be generated much more flexible and adapted to the use case. As described in the following tutorial.

https://clay-foundation.github.io/model/tutorials/clay-v1-wall-to-wall.html

Please let us know if we can help you with testing Clay v1, happy to advise on data preparation for your use case if you have questions!

yellowcap avatar Jun 05 '24 11:06 yellowcap