Kernel Derivatives
There's two components to this enhancement.
Optimization
Define a theta and eta (inverse theta) function to transform parameters between an open bounded interval to a closed bounded interval (or eliminate the bounds entirely) for use in optimization methods. This is similar to how link functions work in logistic regression - unconstrained optimization is used to set a parameter value in the interval (0,1) using the logit link function.
- [x]
theta- given an interval and a value, applies a transformation that eliminates finite open bounds - [x]
eta- given an interval and a value, reverses the value back to the original parameter space - [x]
getthetareturns the theta transformed variable when applied toHyperParametersand a vector of theta transformed variables when used on aKernel - [x]
settheta!this function is used to updateHyperParameters orKernels given a vector of theta-transformed variables - [x]
checkthetaused to check if the provided vector (or scalar if working with a HyperParameter) is a valid update - [x]
upperboundthetareturns the theta-transformed upper bound. For example, in the case that a parameter is restricted to (0,1], the transformed upper bound will be log(1) - [x]
lowerboundthetareturns the theta-transformed lower bound. For example, in the case that a parameter is restricted to (0,1], the transformed lower bound will be -Infinity
Derivatives
Derivatives will be with respect to theta as described above.
- [ ]
gradetaderivative ofetafunction. Using chain rule, this is applied togradkappato get the derivative with respect to theta. Not exported. - [ ]
gradkappaderivative of the scalar part of aKernel. This must be defined for each kernel. It will be manual, so the derivative will be analytical or a hand coded numerical derivative. It will only be defined for parameters of the kernel. Not exported. Ex.dkappa(k, Val{:alpha}, z) - [ ]
gradkernelderivative ofkernel. Second argument will be the variable the derivative is with respect to. A value type with the field name as a parameter will be used. Ex.dkernel(k, Val{:alpha}, x, y) - [ ]
gradkernelmatrixderivative matrix.
Sounds great! How can I help?
Can also you explain what is the relation between this enhancement and the derivatives branch?
Hello!
Very early on there was an attempt at adding derivatives - that's the derivatives branch. However, this added a great deal of complexity. I didn't feel like the base Kernel type and calculation method was carefully planned out before building all this complexity on top. For example, there wasn't really any consideration for the parameter constraints and how that would impact the optimization routines (this can be an issue with open intervals such as the alpha parameter in a Gaussian Kernel - not all kernels can use an unconstrained optimization method).
I've since reworked much of the package and explored how other libraries approach derivatives. Rather than having the Kernel type be a collection of floats, I've now made it a collection of HyperParameter instances. This new HyperParameter type contains a pointer to a value that can be altered as well as an Interval type that can be used to transform the parameter to a domain more amenable to optimization and enforce constraints/invariants.
I'm almost done the changes I've outlined in the "Optimization" section. Unfortunately I need to finish that first since the derivatives have a few dependencies on those changes. Once that is complete, it will just be a matter of defining analytic derivatives for the parameters and a kernel/kernel matrix derivative. I can provide some more direction as soon as that done if you'd like to help. It will be a couple more days though
Excellent! I would like to help with defining the analytical derivatives. It seems that some of them have already been done in the derivatives branch.
Should #2 be closed?
The optimization section is basically complete save for a few tests - so it's good enough to start on the derivatives. I've updated the original comment for some detail. I've also expanded the documentation here:
http://mlkernels.readthedocs.io/en/dev/interface.html
The Hyper Parameters section may be helpful.
If you'd like to add some derivative definitions and open a PR, feel free. You can probably grab a number of them from the derivatives branch (hopefully some reusable tests, too). If you're planning on working on this over the next couple days, I won't be working on anything but I'll try to answer any questions you have.