openneuro icon indicating copy to clipboard operation
openneuro copied to clipboard

ds002345 is unavailable on github due to excessively large commit

Open nellh opened this issue 4 years ago • 2 comments

Describe the bug Many git servers (GitHub included) will reject this dataset due to commit 93a4baec348a7f23bb5301c5807c2fc127d787a8 containing too many bytes of changes.

To Reproduce Steps to reproduce the behavior:

  1. Generate very large commit
  2. Observe it never appears exported to GitHub

Expected behavior OpenNeuro should prevent internally generated commits from reaching this size and reject pushes containing commits like this.

nellh avatar Jan 25 '21 23:01 nellh

@snastase To expose this dataset via datalad, we're going to need to rewrite history, because some WAV files were committed as git files rather than git-annex.

I see that the dataset was uploaded to GitHub through 1.0.1 (https://github.com/OpenNeuroDatasets/ds002345/commit/e20ccff2a329dd1632d286d1f4c74b8864d3d948). Assuming that we rewrite history on top of e20ccff, I don't think there should be any version conflicts.

Wanted to check in with you that this makes sense to you, give you a chance to ask questions before we go ahead with this. I believe it will also make your dataset uneditable for a few hours, although existing snapshots should still be accessible.

effigies avatar Jan 26 '21 21:01 effigies

@effigies Interesting—I'm not sure how the wav files would have been committed via git rather than git-annex. Maybe from me manually replacing a wav file via the web interface, or when the dataset got hung up and required some surgery back in June 2020 (maybe @franklin-feingold remembers). In any case, I'm perfectly fine with you rewriting history in this way, and I'm not worried if it's uneditable for awhile. Let me know if there's anything I should do or check to assist the process.

snastase avatar Jan 26 '21 21:01 snastase