bids-validator icon indicating copy to clipboard operation
bids-validator copied to clipboard

Support validation of git-annex content available in an S3 remote

Open nellh opened this issue 9 months ago • 1 comments

For OpenNeuro to host datasets with remote annexed content, we should support a validation mode that can access (or skip) that remote content as needed.

nellh avatar Mar 13 '25 20:03 nellh

yes! alternatively/complimentary -- could there be some generalization so we could validate some "manifest" structure which would contain e.g. list of filenames, some of them with content (.json or .tsv) and/or URLs for those files online, so could be accessed via smth like https://github.com/fsspec/ (in python - very easy).

That would allow for BIDS validation across archives where data might be on S3 or other HTTP urls, or local -- but then all going through the same interface.

yarikoptic avatar Mar 26 '25 19:03 yarikoptic

Some notes I was making with @rwblair:

remote.log
 <uuid> [<key>=<value>]...
 keys of interest:
   name
   type (S3)
   publicurl
   timestamp

*.rmet
Path: {md5(key)[0:3]}/{md5(key)[3:6]}/{key}.log.rmet
Contents:
  <timestamp> <uuid>:V +<version>#<path>

Key: <hashname>-s<size>--<hash>.<ext>

Logic

 stat file
 if found:
   use local opener
 if not:
   readlink -> (../)*.git/annex/objects/*/*/{key}/{key}
   determine git root
   load remotes by UUID (git-annex:remote.log)
   read rmet (git-annex:{md5(key)[0:3]}/{md5(key)[3:6]}/{key}.log.rmet)
   Construct URL

effigies avatar Oct 13 '25 15:10 effigies

Linking in https://github.com/bids-standard/bids-validator/pull/280, which was a refactor that laid some groundwork for this effort. Next step is to start pulling pieces in from OpenNeuro's isomorphic git-based implementation.

effigies avatar Oct 13 '25 15:10 effigies