openneuro icon indicating copy to clipboard operation
openneuro copied to clipboard

ds001705 - cannot download the latest version using aws

Open pwighton opened this issue 3 years ago • 4 comments

Describe the bug Downloading ds001705 via aws does not download the latest version of the dataset (v1.0.1) nor does it download the original version of the dataset (v1.0.0). It seems to download a hybrid mix of both versions of the dataset.

To Reproduce Steps to reproduce the behavior:

  1. aws s3 --no-sign-request sync s3://openneuro.org/ds001705 ./ds001705
  2. cd ./ds001705
  3. tree
  4. Notice how this download via aws includes files such as sub-000101/ses-baseline/pet/sub-000101_ses-baseline_rec-MLEM_pet.json which, according to the website, are included in v1.0.0 and not in v1.0.1, however this is not simply a download of v1.0.0 of the dataset, since files such as sub-000101/ses-baseline/pet/sub-000102_ses-baseline_K1.nii.gz are not included.

Expected behavior aws s3 --no-sign-request sync s3://openneuro.org/ds001705 ./ds001705 should produce a local directory structure that matches the latest version of the dataset

Additional context Possibly related to #2319

pwighton avatar Nov 08 '22 21:11 pwighton

Hi, thanks for the bug report. This looks like it may be an issue with git-annex, so I've reported what I found there and we'll investigate further. In the meantime, you should be able to download a consistent export of 1.0.1 with datalad or git-annex, it is a complete export but delete markers are missing for the non-annexed files.

nellh avatar Nov 08 '22 22:11 nellh

Thanks @nellh! I can confirm that downloading with datalad is working.

pwighton avatar Nov 18 '22 18:11 pwighton

Great! git-annex has found and fixed this bug upstream and I'll update here once we've deployed the upstream fix.

nellh avatar Nov 18 '22 18:11 nellh

Awesome, thanks @nellh!

pwighton avatar Nov 18 '22 18:11 pwighton