combi icon indicating copy to clipboard operation
combi copied to clipboard

ENH: Define subset of `product`

Open toddrjen opened this issue 10 years ago • 3 comments

Currently combi supports doing a cartesian product using the ProductSpace class. However, in many cases you don't want the every possible combination of all sequences, but rather want to restrict some or all dimensions.

There are two primary approaches I can think of to restricting dimensions that I think could be useful in combi.

First, you can make it so, for each sequence, all other sequences are fixed at some value (say the middle value), and only that sequence is varied.

For example:

>>> a = [1, 2, 3]
>>> b = [10, 20, 30]
>>> c = [100, 200, 300]
>>> d = list(restrict_other(a, b, c, inds={1: 1, 2: 1, 3: 1}))
>>> print(d)
[(2, 20, 200), (1, 20, 200), (3, 20, 200) (2, 10, 200), (2, 30, 200), (2, 20, 100), (2, 20, 300)]

Which would be equivalent to (with duplicates removed):

>>> d = list(product(a, [b[1]], [c[1]])) + list(product([a[1]], b, [c[1]])) + list(product([a[1]], [b[1]], c1))

The second would be the opposite of this, you keep one sequence fixed and vary all the other sequences:

For example:

>>> d = list(restrict_single(a, b, c, inds={1: 1, 2: 1, 3: 1}))
>>> print(d)
[(2, 20, 200), (1, 10, 200), (1, 20, 100), (1, 20, 200), (1, 20, 300), (1, 30, 200), (2, 10, 100), (2, 10, 200), (2, 10, 300), (2, 20, 100), (2, 20, 300), (2, 30, 100), (2, 30, 200), (2, 30, 300), (3, 10, 200), (3, 20, 100), (3, 20, 200), (3, 20, 300), (3, 30, 200)]

Which would be equivalent to (with duplicates removed):

>>> d = list(product([a[1]], b, c)) + list(product(a, [b[1]], c)) + list(product(a, b, [c[1]]))

Ideally the user would be able to specify one or more elements to fix for each sequence.

I see two possible mutually-exclusive APIs for this. One is to provide alternative classes for these two cases. The other is to provide arguments to choose which sequences will behave in each way and which index or indices to fix in those sequences (probably a dict where the key is the index of the sequence in *args and the second is either a single index or a sequence of indices to fix in that sequence). However, the same result as the arguments could be achieved by combining the three classes (the existing ProductSpace class and the two proposed here) in various ways.

toddrjen avatar Dec 11 '14 16:12 toddrjen

Hi Todd,

Thanks for your detailed suggestion. I want to know: Where is this coming from? What is the need for it? Can you describe a few applications?

Thanks, Ram.

cool-RR avatar Dec 11 '14 17:12 cool-RR

My primary interest is in exploring parameter spaces for a simulation. I want to vary one or two parameters while keeping all other parameters fixed in order to see the specific effects of that one or two parameters on the results. When dealing with dozens of arbitrary-length parameters, this gets complicated.

I have my own home-grown recursive solution but that is fairly ugly. It would be much cleaner if I could just define an instance of the first function I described with my parameters in it and pass that to the simulation management code.

With my solution I also can't tell how many results I will get in the end, which means telling how far I have left to go is impossible except in the simplest cases. Your code offers that, which is another benefit.

I know that similar issues exist with experimental design. There are various ways to conduct an experiment when the number of independent variables is too large to do a full cartesian product. Varying one variable or a few variables at a time while keeping the others fixed is a common approach. This also falls under the first example. Of course in this case the results would usually be shuffled by some randomizer, but that is probably outside the scope of this project. I don't need to do that, though, because my simulations are independent and deterministic.

Another approach to experimental design I have seen used for smaller numbers of independent variables is to set each parameter to its minimum, then vary all the other parameters. You then do the same thing with the maximum. This would be covered by the second function I described.

toddrjen avatar Dec 11 '14 17:12 toddrjen

Ah, I understand now. I think I've had this need before too.

There's some trickiness to implementing this but it might be possible. The question is whether its utility justifies implementing it.

The API I'd suggest: I'd keep it in a separate class than ProductSpace, mostly to avoid contaminating the ProductSpace API. But I'd keep it in one class, and you could set the degree of freedom as an argument, so it'll include both your cases. That way if it ends up not being used, we could simply deprecate the class in a future release, rather than being stuck with it in ProductSpace.

One complication is calculating the length, and being able to convert between an index number and a member of the space. I think it's possible, if we break the space down into a few strategic subspaces, but I haven't gone deep enough into it to be sure. This is not a big enough of a need for me to spend time developing right now, sorry. If you'll want to develop this yourself and want to have it added to Combi, I'll be happy to discuss the API.

cool-RR avatar Dec 11 '14 18:12 cool-RR