physionet icon indicating copy to clipboard operation
physionet copied to clipboard

Cannot download using wget or gcp

Open mahela97 opened this issue 2 years ago • 7 comments

When i tried to download the dataset using wget command it just created sub directories with index.html and when i tried using GCP it gives me this error "BadRequestException: 400 Bucket is a requester pays bucket but no user project provided."

mahela97 avatar Sep 23 '22 18:09 mahela97

To download from Google Cloud, you'll need to specify a project ID for covering any download costs. See: https://stackoverflow.com/questions/47739741/bucket-is-requester-pays-bucket-but-no-user-project-provided

If you have a project ID then you can specify it in the download command (see: https://cloud.google.com/storage/docs/using-requester-pays#using). e.g. for gsutil:

gsutil -u PROJECT_IDENTIFIER cp gs://BUCKET_NAME/OBJECT_NAME OBJECT_DESTINATION

If you want to avoid download fees, you can download the data from the PhysioNet servers using the suggested wget command. This will be slower!

tompollard avatar Sep 23 '22 18:09 tompollard

@tompollard thank you for the reply. I am trying using wget but inside the subdirectories, there is only an index.html file instead of images. could you please help me to fix that?

mahela97 avatar Sep 23 '22 18:09 mahela97

@mahela97 I'm not clear what dataset you are trying to download, but essentially I think you'll need to be patient! wget loads the directory structure before files, I believe, so you may not immediately see data within directories.

tompollard avatar Sep 23 '22 18:09 tompollard

@tompollard i am trying with the mimic-cxr. Thank you for the help. Will check after few hours.

mahela97 avatar Sep 23 '22 18:09 mahela97

@mahela97 makes sense, it's a large dataset! I think this is the same issue described at:

  • https://github.com/MIT-LCP/mimic-code/issues/1012
  • https://github.com/MIT-LCP/mimic-code/issues/1006

I'd be interested in hearing how long the download takes to complete with wget.

tompollard avatar Sep 23 '22 18:09 tompollard

Was anybody able to download the dataset with wget command? I left my computer on the whole weekend but nothing happened. I still only see the folders and the index.html file but no images in dicom format. Asking specifically to @mahela97 @ayhyap who raised the issue previously.

edeiana23 avatar Apr 26 '23 09:04 edeiana23

Was anybody able to download the dataset with wget command? I left my computer on the whole weekend but nothing happened. I still only see the folders and the index.html file but no images in dicom format. Asking specifically to @mahela97 @ayhyap who raised the issue previously.

Yes the files eventually transferred, it just took a while. It is possible that some error occurred that caused your wget command to be interrupted. The command provided on the mimic-cxr page includes the relevant options to resume the transfer if it stopped prematurely.

ayhyap avatar Apr 26 '23 09:04 ayhyap