GPJax ARD kernel behavior

A bit confused, looking at the documentation here for ARD kernel, i.e., separate lengthscales per dimension. The doc shows "turning on lengthscale" for 3 of the input dimensions, but then shows only a single lengthscale parameter. Easily verifiable in code that this is actual behavior. I would expect instead n lengthscales, unless I'm misunderstanding? Or at least, this is the behavior as I understand it in GPyTorch -- see "ard_num_dims" param.

Aug 09 '23 00:08 james-bowden

Maybe related to this earlier issue, which behavior still seems to occur. Does this mean that when setting active_dims, just have to manually make your lengthscale the correct dimensionality?

Aug 09 '23 00:08 james-bowden

Hi @james-bowden. Two things are true here: 1) the docs are incorrectly written, 2) one must manually specify the lengthscales in the ARD case and, not doing this, will yield an isotropic kernel. I think it's perhaps helpful to the see the isotropic case as the "default" in GPJax.

In any case, I've just opened a short PR to resolve this ambiguity. However, if you think there would be a more intuitive way for users to specify ARD/isotropic kernels, then I'd love to hear your thoughts.

Aug 09 '23 04:08 thomaspinder

Okay, that makes sense. I think fixing the documentation will at least make it clear what needs to be done--issue is mainly that I thought I had an ARD kernel after specifying active_dims but it was still isotropic, so pretty misleading. I don't have any issues with isotropic being the default.

Some thoughts on how to make this more intuitive:

I would expect that when I specify the active_dims, the lengthscale param is automatically populated to n-vector. Is there a reason this isn't / can't be default behavior?
If you really want the user to manually specify the lengthscales, you could have active_dims take a dict-like struct that forces the user to give a lengthscale array (even if dummy) for each dimension? This is a worse option than 1, though, I think. Really, the problem I'm trying to point out is that I have to take two steps, both setting active_dims and manually populating the lengthscale array, and if I miss one, no errors or anything throw so I just continue unawares and get different results than expected.

Aug 09 '23 17:08 james-bowden

I hear your pain point and I agree with the sentiment. However, my concern with 1. is that I may had D-dimensional covariates and I wish to define an isotropic kernel over a subset of the dimensions. Under the approach of 1, I don't think I'd be able to do this anymore, correct?

An alternative approach would be to have an argument like ard_lengthscale that is scalar valued and will broadcast over the active_dims by default when they're specified. This way the user could still have an isotropic kernel by specifying active_dims = [0, 1, 2], lengthscale=1. whilst a similar ARD kernel would be achieved by active_dims = [0, 1, 2], ard_lengthscale=1.. Thoughts?

Aug 09 '23 18:08 thomaspinder

In the case you proposed, why wouldn't you just select the dimensions of your covariates that you want to apply the kernel to, e.g., D_active = D[0:2,:] and apply a standard isotropic kernel to that? Maybe I'm missing something, but this setup seems to me like a convenience of sorts.

I think the alternative you're suggesting is definitely better than current. Though I can still see people confusing active_dims = [0, 1, 2], lengthscale=1 with meaning ARD is happening. Perhaps this is easily fixed by documenting clearly that active_dims actually doesn't quite correspond to ARD. In any case, just having an ard=True flag might be preferable so that lengthscale argument is still used as normal; this might be a bit less confusing. Thoughts?

Aug 09 '23 19:08 james-bowden