basenji icon indicating copy to clipboard operation
basenji copied to clipboard

how to download the cross2020 model and data?

Open lancezhangsf opened this issue 2 years ago • 11 comments

how to download the cross2020 model and data?

lancezhangsf avatar Aug 10 '22 08:08 lancezhangsf

/basenji/manuscripts/cross2020$ ./get_models.sh --2022-08-10 16:59:39-- https://storage.googleapis.com/basenji_barnyard/model_human.h5 Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.43.16, 172.217.160.80, 142.251.42.240, ... Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.43.16|:443... connected. HTTP request sent, awaiting response... 400 Bad Request 2022-08-10 16:59:45 ERROR 400: Bad Request.

--2022-08-10 16:59:45-- https://storage.googleapis.com/basenji_barnyard/model_mouse.h5 Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.160.112, 142.251.43.16, 172.217.160.80, ... Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.160.112|:443... connected. HTTP request sent, awaiting response... 400 Bad Request 2022-08-10 16:59:51 ERROR 400: Bad Request.

(base) kaldi@kaldi-Super-Server:/data/zsf/WorkSpace/BASENJI/basenji/manuscripts/cross2020$

lancezhangsf avatar Aug 10 '22 09:08 lancezhangsf

not sure if this is a temporary or permanent change, but the cloud bucket now has 'request pays' enabled which means you have to pay google cloud credits to access the data https://cloud.google.com/storage/docs/requester-pays

sheetalgiri avatar Aug 10 '22 11:08 sheetalgiri

We had to switch the training data to requester pays because the cost of offering it was becoming far too large. I'll move the models somewhere free because they're smaller, but I'm having trouble with that right now. Give me a day or two to figure it out.

davek44 avatar Aug 10 '22 23:08 davek44

OK you can now grab the models and other small files from gs://basenji_barnyard2/

davek44 avatar Aug 10 '22 23:08 davek44

when i click this link : https://console.cloud.google.com/storage/browser/basenji_barnyard2

there is a warning : Additional permissions required to list objects in this bucket. Ask a bucket owner to grant you 'storage.objects.list' permission.

Is there a problem with my method?

xxjxuejian avatar Aug 11 '22 10:08 xxjxuejian

I forgot to make it public. Sorry about that. Try again now

davek44 avatar Aug 11 '22 18:08 davek44

Thank you so much for gs://basenji_barnyard2/. But, I want to use the data to run the model, but your gs://basenji_barnyard2/ does not provide a .tfr file, I tried to use basenji_data.py and other files to generate the dataset, but it failed, I don't know where the problem is. I just want to do some tests with the dataset, can you give me a tfr file, just need one,a piece of data in the training set is fine. Thank you! @davek44

xxjxuejian avatar Aug 21 '22 12:08 xxjxuejian

OK, I added a single tfrecord file to the public bucket gs://basenji_barnyard2/demo_tfr/train-0-0.tfr. If you need the entire dataset, just set up an account with payment and download from gs://basenji_barnyard/data/

davek44 avatar Aug 22 '22 18:08 davek44

Thank you very much!

xxjxuejian avatar Aug 23 '22 01:08 xxjxuejian

Hello, I need help in this issue??

I was trying to train enformer model and tried to access basenji data gs://basenji_barnyard/data/ and while running this code, human = get_dataset('human', 'train').batch(1).repeat() mouse_dataset = get_dataset('mouse', 'train').batch(1).repeat() human_mouse_dataset = tf.data.Dataset.zip((human_dataset, mouse_dataset)).prefetch(2) and I got this error---InvalidArgumentError Traceback (most recent call last) in <cell line: 1>() ----> 1 human = get_dataset('human', 'train').batch(1).repeat() 2 mouse_dataset = get_dataset('mouse', 'train').batch(1).repeat() 3 human_mouse_dataset = tf.data.Dataset.zip((human_dataset, mouse_dataset)).prefetch(2)

6 frames /usr/local/lib/python3.10/dist-packages/tensorflow/python/lib/io/file_io.py in stat_v2(path) 922 errors.OpError: If the operation fails. 923 """ --> 924 return _pywrap_file_io.Stat(compat.path_to_str(path)) 925 926

InvalidArgumentError: Error executing an HTTP request: HTTP response code 400 with body '{ "error": { "code": 400, "message": "Bucket is a requester pays bucket but no user project provided.", "errors": [ { "message": "Bucket is a requester pays bucket but no user project provided.", "domain": "global", "reason": "required" } ] } } ' when reading metadata of gs://basenji_barnyard/data/human/statistics.json

Dnelnaker avatar Feb 05 '24 07:02 Dnelnaker

We had to switch the training data to requester pays because the cost of offering it was becoming far too large. You'll need to setup a payment method for your Google Cloud account. The cost should be very low relative to your other research costs.

davek44 avatar Feb 05 '24 18:02 davek44