Vizier output transforms
Motivation
I wanted to experiment with the vizier output transforms in botorch - in particular the infeasible transform. Potentially misplaced effort as it appears that within the meta ecosystem Ax actually handles this sort of stuff and so these may not actually end up being able to used very much but worth upstreaming.
takes transforms from: https://arxiv.org/abs/2408.11527
Test Plan
Added tests for all functionality, not sure I am at 100% coverage
Related PRs
this branch takes off from https://github.com/pytorch/botorch/pull/2636 so that would need to be merged first.
Thanks for adding these. I've been curious to try out the Vizier transforms. These transforms try to ensure we model the good regions better, which seems better than trying to model everything equally well, when the goal is to optimize a function. Have you tried these on any benchmarks yet? It'd be good to see how they work e2e.
I implemented these here before looking into what I would need to do to connect them to my benchmark testcase that I have been prototyping in Ax. I assumed that the best case would be to implement it lower in the stack and then bubble up but seemingly I should have implemented at the level of Ax (especially as the harder part here was handling the batch_size) hence the comment about potentially misplaced effort. It may be that these shouldn't be merged here if the details can't be hashed out.
None of these transforms implement the
untransform_posteriormethod, which is necessary to make sure the transforms work as anoutcome_transformin BoTorch models. Without it,Model.posteriorwill error out. Would it be possible to implementuntransform_posterior, likely returning aTransformedPosteriorobject.
This should be able to be done based on the logic in Standardize and just using the sample transform. In general the bit here that makes things annoying is the batch_size.
None of the transforms seem to support
Yvareither. We use BoTorch through Ax and try to keep the same methods regardless of whether the noise is observed. What does Vizier do with observation noise? Is it simply ignored, or is this just a limitation of current implementation?
I think the vizier approach is similar to performing the transformations at the layer of Ax. Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform
Ax vs BoTorch
This is something I've also been thinking about quite a bit lately, as part of reworking what Ax uses by default (though this has been rather focused on input transforms so far). Outcome transforms may be slightly simpler in Ax. You technically only need to be able to transform, though we also want some rough untransform method for reporting & cross validation purposes. In BoTorch, you need the transform to be differentiable, since the untransformed posterior is used to compute the acquisition value which needs to be differentiated through during optimization. Ax transforms are applied once as a pre-processing step but BoTorch transforms are applied at each posterior call. There is also the aspect of learnable / tunable transforms in BoTorch but that doesn't apply to outcome transforms.
If I were to implement Vizier transforms, I'd probably try them in Ax, since there are fewer restrictions.
In general the bit here that makes things annoying is the batch_size.
Do you often use batched models? If not, you don't need to support batch_size arg and different treatment for each batch. You could use the same parameters for all batches.
I think the vizier approach is similar to performing the transformations at the layer of Ax.
Agreed. It is more of a pre-processing step.
Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform
Cool, this is good to know.
Will move forward with a PR at Ax as that seems like the faster way to make progress on my usecase
Sounds good. StandardizeY or PowerTransformY should be good examples to follow
closing as abandoned