zoon
zoon copied to clipboard
Aggregating results e.g. bootstraps
This has come up before, but discussed it at the workshop so putting it down here.
If you do something like
w <- workflow(CWBZimbabwe,
Bioclim,
Replicate(Bootstrap, n = 100),
LogisticRegression,
NoOutput)
the bootstraps are only useful when examined together. There's plenty of other examples. An output module that compares the maps from two models for example.
My only thinking so far (having not thought AT ALL about implementation) is to allow output modules to request to see a number of analysis lines.
w <- workflow(CWBZimbabwe,
Bioclim,
Replicate(Bootstrap, n = 100),
LogisticRegression,
Aggregate(PlotModelCoefficients))
for example might plot a histogram of the 100 bootstrapped coefficients (possibly an example with other problems, like getting coefficients from all model types).
w <- workflow(CWBZimbabwe,
Bioclim,
list(Replicate(Bootstrap, n = 100), NoProcess),
LogisticRegression,
Chain(Aggregate(PlotModelCoefficients, 1:100), PrintMap)
Here the 1:100 is telling Aggregate
to combine analyses 1 to 100 (the bootstraps). The analysis 101, i.e. the full unbootstrapped model, is plotted, without being included in the aggregation.
Just some thoughts.
Nice, I like that syntax! Won't it be difficult to make Aggregate
generic enough to apply to all outputs though?
I wonder if we should add a label (like we have for data type) for the output modules saying whether they expect to act on the whole list of inputs, or apply to each input one at a time.
This would be well worth discussing in the workshop today if we find a convenient time.
Yes I think it would be hard for a module to work on one or many lines. So yes, some sort of "This module only works inside aggregate" might be best.
Definitely worth discussing. I discussed bootstrapping in general with Carsten last night. The alternative we discussed was making cross validation deal with bootstrapping. But that only fixes the specific bootstrap issue, not the more general issue of comparing models etc.
I like this idea. This is what I needed for the appify module. If we think output modules will either work by aggregating or not (i.e. not both) then we could add an output module tag eg type: aggregate
and use that to inform workflow
to aggregate or not without the need for a specific call to aggregate. Not sure how this would work with Tim's second example though, not well I think
Yup, I think that's definitely worth implementing. We can think about extending it again later if we find another major use case