openneuro icon indicating copy to clipboard operation
openneuro copied to clipboard

Files pushed via annex are not `fsck`ed on receipt, allowing for corrupt data to be published

Open tsalo opened this issue 1 year ago • 5 comments

What went wrong?

I am trying to clone my dataset (ds005250) with datalad. When trying to datalad get certain files, I get the following error:

get(error): sub-02/ses-1/func/sub-02_ses-1_task-rest_acq-MBME_run-02_echo-5_part-mag_bold.nii.gz (file) [S3 bucket does not allow public access; Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3                                                     
verification of content failed
Verification of content failed]

I assumed that something went wrong when I initially upload the problematic files, so I used the website to upload new versions of each of the 15 that were failing. After doing that, 6 of the 15 files could be pulled from OpenNeuro without issue, but the remaining 9 still fail.

Expected behavior

The files should download without issue.

How to reproduce

  1. datalad install https://github.com/OpenNeuroDatasets/ds005250.git
  2. cd ds005250
  3. datalad get sub-02/ses-1/func/sub-02_ses-1_task-rest_acq-MBME_run-02_echo-5_part-mag_bold.nii.gz

Desktop

  • OS: Ubuntu
  • Browser: n/a
  • Version: n/a

Phone

  • Device:
  • OS:
  • Browser:
  • Version:

Additional information

I have been using datalad update -s origin --how=reset to update my clone after making new releases on OpenNeuro.

tsalo avatar Jun 26 '24 16:06 tsalo

I just wanted to follow up on this. Is there anything I can do to diagnose the problem and get the remaining files uploaded to OpenNeuro?

tsalo avatar Sep 13 '24 16:09 tsalo

Let me have a look server-side.

effigies avatar Sep 13 '24 16:09 effigies

It looks like these 9 files are corrupted, server-side:

fsck sub-02/ses-1/func/sub-02_ses-1_task-rest_acq-MBME_run-02_echo-5_part-mag_bold.nii.gz
  sub-02/ses-1/func/sub-02_ses-1_task-rest_acq-MBME_run-02_echo-5_part-mag_bold.nii.gz: Bad file size (40.09 MB smaller); moved to .git/annex/bad/SHA256E-s218670986--5e3fd400ba7e3f2b64b6b6d5a86f5e3e24b5a5a8bf5006f2aa3325c747021a7d.nii.gz
failed
fsck sub-03/ses-2/func/sub-03_ses-2_task-rest_acq-MBME_run-02_echo-3_part-mag_bold.nii.gz
  sub-03/ses-2/func/sub-03_ses-2_task-rest_acq-MBME_run-02_echo-3_part-mag_bold.nii.gz: Bad file size (21.72 MB smaller); moved to .git/annex/bad/SHA256E-s244884891--f0373646499af403463a45a817d164b93399edc23a349b6af22bbc4d6cccaf34.nii.gz
failed
fsck sub-03/ses-2/func/sub-03_ses-2_task-rest_acq-MBME_run-02_echo-3_part-phase_bold.nii.gz
  sub-03/ses-2/func/sub-03_ses-2_task-rest_acq-MBME_run-02_echo-3_part-phase_bold.nii.gz: Bad file size (258.06 MB smaller); moved to .git/annex/bad/SHA256E-s346685526--f683204f3a69f38987e15763cfa5496ad247e408ab5cdb342332ba8848e628a7.nii.gz
failed
fsck sub-04/ses-1/func/sub-04_ses-1_task-fracback_acq-MBME_echo-5_part-phase_bold.nii.gz
  sub-04/ses-1/func/sub-04_ses-1_task-fracback_acq-MBME_echo-5_part-phase_bold.nii.gz: Bad file size (7.24 MB smaller); moved to .git/annex/bad/SHA256E-s322762507--74f28f3544f13e2986a50cc452fa15bdda63878930d953c45e2af77a42f16763.nii.gz
failed
fsck sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-01_echo-1_part-mag_bold.nii.gz
  sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-01_echo-1_part-mag_bold.nii.gz: Bad file size (16.22 MB smaller); moved to .git/annex/bad/SHA256E-s276236911--222c0b3370108553c0ad39092d15b34fc1c40a165a2aaa69c6ffb9bcbe8aa8bd.nii.gz
failed
fsck sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-01_echo-1_part-phase_bold.nii.gz
  sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-01_echo-1_part-phase_bold.nii.gz: Bad file size (144.42 MB smaller); moved to .git/annex/bad/SHA256E-s345422637--c46a998a315160e7e69a87c35f4a3ce4ed73204745f166ab611bbac5042fda19.nii.gz
failed
fsck sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-02_echo-4_part-phase_bold.nii.gz
  sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-02_echo-4_part-phase_bold.nii.gz: Bad file size (144.36 MB smaller); moved to .git/annex/bad/SHA256E-s346814534--391cafaa9d90bdb6453cedda6d3765a40abfb1e0ab5ba5bf60de85844015ac52.nii.gz
failed
fsck sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-02_echo-5_part-mag_bold.nii.gz
  sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-02_echo-5_part-mag_bold.nii.gz: Bad file size (25.88 MB smaller); moved to .git/annex/bad/SHA256E-s217618463--8d7d5ccb2227e344ca9d56dce96ad5623f53056afaf76139f4de8b82f8f2d1c7.nii.gz
failed
fsck sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-02_echo-5_part-phase_bold.nii.gz
  sub-04/ses-1/func/sub-04_ses-1_task-rest_acq-MBME_run-02_echo-5_part-phase_bold.nii.gz: Bad file size (219.28 MB smaller); moved to .git/annex/bad/SHA256E-s346860182--16a6d35a22dce6c423fd79891bfaf7b501a2c9290023be69fe2bb060b984d6be.nii.gz
failed

The corrupted versions are also exported to S3. Would you be willing to do a live call and we can see if we can upload fixed versions without cutting a new snapshot? Otherwise, deleting these files from the draft and reuploading should probably do the trick.

effigies avatar Sep 13 '24 17:09 effigies

That would be great, thanks!

tsalo avatar Sep 13 '24 17:09 tsalo

After investigating, the error arose on the client side, pushing corrupted files to OpenNeuro. Whether they were corrupted (truncated) during transfer or before is somewhat moot; we should error if received keys contents do not match their checksums.

effigies avatar Sep 16 '24 15:09 effigies