Gradient-Free-Optimizers
Gradient-Free-Optimizers copied to clipboard
Categorical distribution
Is your feature request related to a problem? Please describe. Hi, I really like the look of this library and am interested in using it in the context of optimization in discrete spaces (specifically spaces of protein sequences). A protein sequence is a sequence of length L letters drawn from a fixed-size alphabet of size K. i.e. a vector of L K-way categorical variables. The most natural way to sample a new point to consider when performing hill climbing in this case is to sample a fixed integer number of mutations (i.e. single-letter swaps) to the current sequence. The sampling process can then be expressed as a product of categorical distributions, since there are L x K single mutations (including null mutations). Describe the solution you'd like I'd like a natural way to sample from categorical distributions when proposing updated positions. My understanding is that the search space is naturally interpretable as a categorical space, so it seems like this could make sense more generally? Describe alternatives you've considered
Additional context
Hello @alex-hh,
thanks for opening this issue. Your problem sounds very interesting! I have never seen (or understood) this kind of optimization problem. Is the length L always the same or can it vary? Could you provide some more explanations how the search-space would look like?
If the length L would be fixed (for example 3), would it make sense, that the search-space looks like this?
fixed_size_alphabet = ["A", "B", "C", "D"]
search_space = {
"letter.0": fixed_size_alphabet,
"letter.1": fixed_size_alphabet,
"letter.2": fixed_size_alphabet,
}
Is this what you mean by categorical distributions?
Exactly, yes. In general L could vary, but fixing L is a good starting point.
Hello @alex-hh,
very good. Gradient-Free-Optimizers does only support numerical search-spaces (numpy arrays), but the python package Hyperactive is build on top of Gradient-Free-Optimizers and supports numbers, strings and even functions in the search-space. It also has a lot of examples, that show how you can build the search-space.
If you have another question related to this issue in GFO you can ask it in here, otherwise you can close this issue. If you need help with Hyperactive you can open an issue in its repository.
I hope I was able to help you.
Thanks, my question wasn't actually about the search space construction, but about the 'distribution' parameter that is passed to optimisers.
I haven't completely understood how this works in the case of the existing distributions.
But my question is whether it is possible to use a categorical distribution to sample updated points within the search space.
The example of this in the case described above is sampling an updated sequence by making exactly one mutation (letter-swap) to the current sequence. Since there is no categorical distribution option, it wasn't clear to me whether this was possible.
Okay, so the distribution-parameter is similar to a numpy distribution. Those distributions are continuous, but the GFO search-space is discrete. GFO automatically rounds the new position to the next integer.
I am not quite sure what you mean by:
use a categorical distribution to sample updated points
You currently cannot pass your own distribution to the parameter. You have to pass a string that chooses from one of four possible numpy-distributions. But the distribution-parameter is just a way to change the behaviour of the algorithm. It does not change its capability to solve certain problems.
From what I read I do not see why using GFO for your problem would not work. I think the central part is to model the problem into an objective-function and search-space, that fits to GFO.
Right, so the behaviour I'd like to see is something like this:
Suppose I'm at a position in the search space and I want the next candidate position to evaluate to be sampled at random from the neighbours of the current position.
In the search space we discussed above, the neighbouring points might for example be all points which involve a single mutation, at any position, to the current sequence.
In that case there are Lx(K-1) neighbouring points (a choice of one of K-1 mutations at one of L positions). Any distribution over the neighbouring points is therefore a categorical distribution with Lx(K-1) entries.
For example, if the entries are all equal, it is a uniform distribution over the neighbours and I have an equal chance of selecting any of the possible single mutants.
Is it straightforward currently to have this kind of sampling of candidate positions in GFO? Is there a reason for using continuous distributions and not discrete ones, given the discrete search space?
Hello @alex-hh,
Is it straightforward currently to have this kind of sampling of candidate positions in GFO?
It really depends on how this looks like in code. Could you provide an example? If this kind of sampling can be implemented as a function that takes the current position in a shape (numpy array) and returns a sample of the same shape. Then yes it would probably be easy. Maybe you could provide a wrapper-function for a categorical distribution that works like that.
Is there a reason for using continuous distributions and not discrete ones, given the discrete search space?
I started using the normal distribution, because I wanted to implement a hill climbing algorithm that acts like a particle (in quantum physics), where the position is a probability distribution. Some of this is also explained in the documentation. Choosing the normal distribution made sense, because I was able to control its average step size with $\sigma$. It also made the algorithm more flexible, because the step-size can vary. In its default configuration the hill climbing will choose close neighbours with a high probability and distant ones with low probability. So the normal-distribution seemed natural for me (given this purpose). Later I added other distributions (just for fun ;-) ).
Here you can see the source-code of the hill-climbing algorithm. The marked lines show where the distribution is used.