easybuild-framework icon indicating copy to clipboard operation
easybuild-framework copied to clipboard

add support for data installations

Open smoors opened this issue 11 months ago • 1 comments

motivation

  • leverage EB to install data in a standardized way with proper versioning and checksumming
  • support adding datasets as dependency for software
  • easily swap dataset versions with ml swap

changes

  • add cmd line option --installpath-data similar to --installpath-software
  • add cmd line option --subdir-data (default = data) similar to --subdir-software
  • add cmd line option --sourcepath-data similar to --sourcepath
  • add Easyconfig parameter data_sources similar to sources

design

  • the main reason for a separate subdir_data is reusability: in contrast to software it does not have to be rebuilt/reinstalled when for example upgrading the OS or building for a new architecture
  • the reason for a separate sourcepath_data is that datasets can be very large, so you may want to store them in a different file system or location.

smoors avatar Mar 03 '24 13:03 smoors