osemosys_global icon indicating copy to clipboard operation
osemosys_global copied to clipboard

Work out where to host the data, how to deploy it, and declare licenses etc.

Open willu47 opened this issue 4 years ago • 2 comments

At the moment, data is stored in the data folder in the repository.

When installing as a Python library, this data is not included, and so all local references to data break. There are a few workarounds:

  1. include all the data inside the Python library so it is installed with the package
  2. host the data on a web-server, or provide links to online sources for all the data
  3. distribute the data in an zip archive, and get users to place somewhere manually

In terms of pros/cons:

  1. a bit of a hack, users cannot see the data (e.g. identify data issues), bulky codebase, and an unhealthy mix of code and data;
  2. potentially brittle, as broken links to other sources would break all installed versions of the package; potential licensing issues if data is not open? However, all users would benefit from central updates to data;
  3. simplest, need to clear data licences, messy install;

willu47 avatar Feb 22 '21 20:02 willu47

Similar to #34 and #35, i've moved this to the 'Longer term developments' milestone as turning OG into a useable python library has a bit lower priority at the moment due to the constrained development capacity. If there is bandwidth within one of the collaborating institutions it would be very useful though! If there is anything to be added to this issue @trevorb1 feel free.

maartenbrinkerink avatar Sep 02 '24 20:09 maartenbrinkerink

Another option not mentioned above is: 4. Add a command line interface argument which downloads the data and configures the Snakemake workflow

willu47 avatar Sep 04 '24 10:09 willu47