dscr
dscr copied to clipboard
Seed subsets
The engine currently has seed subsetting appear in a couple of places with a tacit assumption that the seeds are the same between scenarios, i.e. that a dsc (ignoring output parsers) is a Cartesian product:
scenarios x seeds x methods x scores
.
However, the dsc itself allows the user to specify an arbitrary set of seeds for each scenario, leading to a situation that could look like this:
scenario seeds
scen1 1 2 3 4
scen2 5 6
scen3 1 2 3 4
(Actually I wonder if users are already depending on this behavior and using seed subsets to do funny things with their workflows.)
It's worth discussing what functionality we want to provide in terms of seed subsetting. At the coarsest level of control, we could allow no seed subsetting at all. At the finest, we could allow the user to pass a data frame of the exact scenario/seed combinations that he or she wants to execute. The current state is somewhere between these two and encourages whimsical, opaque hacks.
Another thing I've noticed in my refactoring is that one cannot currently subset run_scores
according to a seed subset.