MLKernels.jl Kernel Derivatives

There's two components to this enhancement.

Optimization

Define a theta and eta (inverse theta) function to transform parameters between an open bounded interval to a closed bounded interval (or eliminate the bounds entirely) for use in optimization methods. This is similar to how link functions work in logistic regression - unconstrained optimization is used to set a parameter value in the interval (0,1) using the logit link function.

[x] theta - given an interval and a value, applies a transformation that eliminates finite open bounds
[x] eta - given an interval and a value, reverses the value back to the original parameter space
[x] gettheta returns the theta transformed variable when applied to HyperParameters and a vector of theta transformed variables when used on a Kernel
[x] settheta! this function is used to update HyperParameters or Kernels given a vector of theta-transformed variables
[x] checktheta used to check if the provided vector (or scalar if working with a HyperParameter) is a valid update
[x] upperboundtheta returns the theta-transformed upper bound. For example, in the case that a parameter is restricted to (0,1], the transformed upper bound will be log(1)
[x] lowerboundtheta returns the theta-transformed lower bound. For example, in the case that a parameter is restricted to (0,1], the transformed lower bound will be -Infinity

Derivatives

Derivatives will be with respect to theta as described above.

[ ] gradeta derivative of eta function. Using chain rule, this is applied to gradkappa to get the derivative with respect to theta. Not exported.
[ ] gradkappa derivative of the scalar part of a Kernel. This must be defined for each kernel. It will be manual, so the derivative will be analytical or a hand coded numerical derivative. It will only be defined for parameters of the kernel. Not exported. Ex. dkappa(k, Val{:alpha}, z)
[ ] gradkernel derivative of kernel. Second argument will be the variable the derivative is with respect to. A value type with the field name as a parameter will be used. Ex. dkernel(k, Val{:alpha}, x, y)
[ ] gradkernelmatrix derivative matrix.

May 20 '17 21:05 trthatcher

Sounds great! How can I help? Can also you explain what is the relation between this enhancement and the derivatives branch?

Jun 01 '17 09:06 kskyten

Hello!

Very early on there was an attempt at adding derivatives - that's the derivatives branch. However, this added a great deal of complexity. I didn't feel like the base Kernel type and calculation method was carefully planned out before building all this complexity on top. For example, there wasn't really any consideration for the parameter constraints and how that would impact the optimization routines (this can be an issue with open intervals such as the alpha parameter in a Gaussian Kernel - not all kernels can use an unconstrained optimization method).

I've since reworked much of the package and explored how other libraries approach derivatives. Rather than having the Kernel type be a collection of floats, I've now made it a collection of HyperParameter instances. This new HyperParameter type contains a pointer to a value that can be altered as well as an Interval type that can be used to transform the parameter to a domain more amenable to optimization and enforce constraints/invariants.

I'm almost done the changes I've outlined in the "Optimization" section. Unfortunately I need to finish that first since the derivatives have a few dependencies on those changes. Once that is complete, it will just be a matter of defining analytic derivatives for the parameters and a kernel/kernel matrix derivative. I can provide some more direction as soon as that done if you'd like to help. It will be a couple more days though

Jun 03 '17 17:06 trthatcher

Excellent! I would like to help with defining the analytical derivatives. It seems that some of them have already been done in the derivatives branch.

Should #2 be closed?

Jun 03 '17 18:06 kskyten

The optimization section is basically complete save for a few tests - so it's good enough to start on the derivatives. I've updated the original comment for some detail. I've also expanded the documentation here:

http://mlkernels.readthedocs.io/en/dev/interface.html

The Hyper Parameters section may be helpful.

If you'd like to add some derivative definitions and open a PR, feel free. You can probably grab a number of them from the derivatives branch (hopefully some reusable tests, too). If you're planning on working on this over the next couple days, I won't be working on anything but I'll try to answer any questions you have.

Jun 09 '17 13:06 trthatcher