botorch icon indicating copy to clipboard operation
botorch copied to clipboard

Vizier output transforms

Open CompRhys opened this issue 1 year ago • 4 comments

Motivation

I wanted to experiment with the vizier output transforms in botorch - in particular the infeasible transform. Potentially misplaced effort as it appears that within the meta ecosystem Ax actually handles this sort of stuff and so these may not actually end up being able to used very much but worth upstreaming.

takes transforms from: https://arxiv.org/abs/2408.11527

Test Plan

Added tests for all functionality, not sure I am at 100% coverage

Related PRs

this branch takes off from https://github.com/pytorch/botorch/pull/2636 so that would need to be merged first.

CompRhys avatar Dec 03 '24 14:12 CompRhys

Thanks for adding these. I've been curious to try out the Vizier transforms. These transforms try to ensure we model the good regions better, which seems better than trying to model everything equally well, when the goal is to optimize a function. Have you tried these on any benchmarks yet? It'd be good to see how they work e2e.

I implemented these here before looking into what I would need to do to connect them to my benchmark testcase that I have been prototyping in Ax. I assumed that the best case would be to implement it lower in the stack and then bubble up but seemingly I should have implemented at the level of Ax (especially as the harder part here was handling the batch_size) hence the comment about potentially misplaced effort. It may be that these shouldn't be merged here if the details can't be hashed out.

None of these transforms implement the untransform_posterior method, which is necessary to make sure the transforms work as an outcome_transform in BoTorch models. Without it, Model.posterior will error out. Would it be possible to implement untransform_posterior, likely returning a TransformedPosterior object.

This should be able to be done based on the logic in Standardize and just using the sample transform. In general the bit here that makes things annoying is the batch_size.

None of the transforms seem to support Yvar either. We use BoTorch through Ax and try to keep the same methods regardless of whether the noise is observed. What does Vizier do with observation noise? Is it simply ignored, or is this just a limitation of current implementation?

I think the vizier approach is similar to performing the transformations at the layer of Ax. Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform

CompRhys avatar Dec 03 '24 19:12 CompRhys

Ax vs BoTorch

This is something I've also been thinking about quite a bit lately, as part of reworking what Ax uses by default (though this has been rather focused on input transforms so far). Outcome transforms may be slightly simpler in Ax. You technically only need to be able to transform, though we also want some rough untransform method for reporting & cross validation purposes. In BoTorch, you need the transform to be differentiable, since the untransformed posterior is used to compute the acquisition value which needs to be differentiated through during optimization. Ax transforms are applied once as a pre-processing step but BoTorch transforms are applied at each posterior call. There is also the aspect of learnable / tunable transforms in BoTorch but that doesn't apply to outcome transforms.

If I were to implement Vizier transforms, I'd probably try them in Ax, since there are fewer restrictions.

In general the bit here that makes things annoying is the batch_size.

Do you often use batched models? If not, you don't need to support batch_size arg and different treatment for each batch. You could use the same parameters for all batches.

I think the vizier approach is similar to performing the transformations at the layer of Ax.

Agreed. It is more of a pre-processing step.

Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform

Cool, this is good to know.

saitcakmak avatar Dec 03 '24 21:12 saitcakmak

Will move forward with a PR at Ax as that seems like the faster way to make progress on my usecase

CompRhys avatar Dec 05 '24 18:12 CompRhys

Sounds good. StandardizeY or PowerTransformY should be good examples to follow

saitcakmak avatar Dec 05 '24 18:12 saitcakmak

closing as abandoned

CompRhys avatar Aug 18 '25 15:08 CompRhys