zoon icon indicating copy to clipboard operation
zoon copied to clipboard

Aggregating results e.g. bootstraps

Open timcdlucas opened this issue 8 years ago • 4 comments

This has come up before, but discussed it at the workshop so putting it down here.

If you do something like

w <- workflow(CWBZimbabwe, 
              Bioclim, 
              Replicate(Bootstrap, n = 100),
              LogisticRegression, 
              NoOutput)

the bootstraps are only useful when examined together. There's plenty of other examples. An output module that compares the maps from two models for example.

My only thinking so far (having not thought AT ALL about implementation) is to allow output modules to request to see a number of analysis lines.

w <- workflow(CWBZimbabwe, 
              Bioclim, 
              Replicate(Bootstrap, n = 100),
              LogisticRegression, 
              Aggregate(PlotModelCoefficients))

for example might plot a histogram of the 100 bootstrapped coefficients (possibly an example with other problems, like getting coefficients from all model types).

w <- workflow(CWBZimbabwe, 
              Bioclim, 
              list(Replicate(Bootstrap, n = 100), NoProcess),
              LogisticRegression, 
              Chain(Aggregate(PlotModelCoefficients, 1:100), PrintMap)

Here the 1:100 is telling Aggregate to combine analyses 1 to 100 (the bootstraps). The analysis 101, i.e. the full unbootstrapped model, is plotted, without being included in the aggregation.

Just some thoughts.

timcdlucas avatar Jun 15 '16 07:06 timcdlucas

Nice, I like that syntax! Won't it be difficult to make Aggregate generic enough to apply to all outputs though?

I wonder if we should add a label (like we have for data type) for the output modules saying whether they expect to act on the whole list of inputs, or apply to each input one at a time.

This would be well worth discussing in the workshop today if we find a convenient time.

goldingn avatar Jun 15 '16 07:06 goldingn

Yes I think it would be hard for a module to work on one or many lines. So yes, some sort of "This module only works inside aggregate" might be best.

Definitely worth discussing. I discussed bootstrapping in general with Carsten last night. The alternative we discussed was making cross validation deal with bootstrapping. But that only fixes the specific bootstrap issue, not the more general issue of comparing models etc.

timcdlucas avatar Jun 15 '16 07:06 timcdlucas

I like this idea. This is what I needed for the appify module. If we think output modules will either work by aggregating or not (i.e. not both) then we could add an output module tag eg type: aggregate and use that to inform workflow to aggregate or not without the need for a specific call to aggregate. Not sure how this would work with Tim's second example though, not well I think

AugustT avatar Aug 08 '16 14:08 AugustT

Yup, I think that's definitely worth implementing. We can think about extending it again later if we find another major use case

goldingn avatar Aug 08 '16 21:08 goldingn