osemosys_global
osemosys_global copied to clipboard
Work out where to host the data, how to deploy it, and declare licenses etc.
At the moment, data is stored in the data folder in the repository.
When installing as a Python library, this data is not included, and so all local references to data break. There are a few workarounds:
- include all the data inside the Python library so it is installed with the package
- host the data on a web-server, or provide links to online sources for all the data
- distribute the data in an zip archive, and get users to place somewhere manually
In terms of pros/cons:
- a bit of a hack, users cannot see the data (e.g. identify data issues), bulky codebase, and an unhealthy mix of code and data;
- potentially brittle, as broken links to other sources would break all installed versions of the package; potential licensing issues if data is not open? However, all users would benefit from central updates to data;
- simplest, need to clear data licences, messy install;
Similar to #34 and #35, i've moved this to the 'Longer term developments' milestone as turning OG into a useable python library has a bit lower priority at the moment due to the constrained development capacity. If there is bandwidth within one of the collaborating institutions it would be very useful though! If there is anything to be added to this issue @trevorb1 feel free.
Another option not mentioned above is: 4. Add a command line interface argument which downloads the data and configures the Snakemake workflow