DataSets.jl icon indicating copy to clipboard operation
DataSets.jl copied to clipboard

Circuitscape use case

Open c42f opened this issue 3 years ago • 0 comments

Background

I've been looking at Circutscape.jl as an interesting use case for DataSets.jl. Here's a design for how DataSets could support circuitscape user workflows.

Circuitscape is an interesting case because it's a complete application with existing data management code etc — there's the Circuitscape.compute() function which takes a config file and uses that to discover the input data and output location, and the Circuitscae.start() function which is a wizard which helps users create such a config file0. Because DataSets tries to do IO management and data discovery, some of the data discovery parts of Circuitscape should be replaced with a DataSets-based interface.

I think users should be able to interactively

  • Manage their project datasets — provided by the data REPL (in future, perhaps some GUI data browser)
  • Launch circuitscape jobs — provided by a data REPL run command.

Workflow example

Here's a quick sketch of the workflow:

The wizard Circuitscape.start() acts as it does currently, but instead of linking to existing data in some arbitrary location in the filesystem, it copies the data into a new DataSet. The type of that dataset can be CircuitScapeInput or some such — internally it's just backed by the exact same directory structure as Circutscape currently has.

data> run circuitscape   # If run with no data, calls start (?)

# wizard steps ...

[ Info: Created new input dataset `raster_pairwise_1`

data>

I'm imagining that the Circuitscape.compute() would be replaced by the data REPL run command, and add functionality for listing which data is available for running with. Something like:

Available circuitscape input data:
  📂 raster_pairwise_1      type=CircuitScapeInput
  📂 raster_one_to_all_1    type=CircuitScapeInput

data> run circuitscape raster_pairwise_1 output1!
[ Info: ...

data> ls
  📂 output_1               type=CircuitScapeOutput
  📂 raster_pairwise_1      type=CircuitScapeInput
  📂 raster_one_to_all_1    type=CircuitScapeInput

For run to work, the data REPL needs to be resurrected and taught look at the database of entry points which is currently set up by @datafunc. Then circuitscape would declare several data entry points @datafunc circuitscape to hook into data> run.

c42f avatar Nov 30 '20 00:11 c42f