shogun
shogun copied to clipboard
make model selection on GP module more flexible
At default, inference methods in GP module will compute gradients wrt model parameters, which is not always necessary.
- offer users an option to ask inference methods do not compute these gradients.
- Make the model selection framework work with
FITC inference(mainlyinducing points) - (suspicious bug, some GP parameters are in natural domain while some gradients wrt parameters are in log domain. Domain of a parameter and its gradient should be consistent)
@karlnapf
will work on this issue once notebooks are done
- I would actually go to move all GP parameters to log-domain. We can still interface and store them in standard domain, but all dervatives in log-domain would be better for many reasons
- inducing point model selection would be good to have I agree
- not computing all gradients definitely is a good thing to have.
- I would like to add more efficient (GPU, parallel) ways to compute gradients. A unified way to speed up is needed rather than all this ad-hoc code. What are your thoughts here?
@karlnapf
Currently, pthread is used to parallelly compute the gradients.
about moving all GP parameters to log-domain:
- not all parameters are positive (for GaussianARDKernel, where weights are FULL_Matrix) (adding a gradient transformation layer is not easy since we have to decide whether a gradient is in log domain)
- many existing parameters in Shogun are in standard domain. (It will break many codes if we naively change existing positive parameters in Shogun from standard domain to log domain.)
My opinion is
- adding a specific gradient transformation layer for each parameter is possible.
- However, doing so requires to change internal Shogun classes (
CParameterCombination.TParameter). (maybe we need add anew fieldabout parameter transformation inTParameter) - Instead, I prefer to this simple idea---
parameter and its gradientshould be in thesamedomain.
My idea is (for example in GaussianKernel) to add a log flag
CGaussianKernel::CGaussianKernel(int32_t size, float64_t w)
becomes
CGaussianKernel::CGaussianKernel(int32_t size, float64_t w, bool is_log_domain=false)
if flag is_log_doamin is true, w is in log domain and its gradient will be first computed in standard domain (the existing code) and then transformed (post-processing code to be implemented) in log domain.
@karlnapf Let me know if you are agree with my idea so that I will implement this idea.
@karlnapf
the existing model selection (CGradientModelSelection) does not do gradient transformation, which means there is a bug if a parameter and its gradient are in different domains.
@karlnapf Any feedback on the model selection
I usually prefer general solutions that do things automatically to those that require extra arugments and thinking by the user. Users should not be able to decide whether a parameter is in log-domain or not, but we should do the appropriate thing automatically (in a transparent way)
Not sure what to do though. How do other toolboxes do this? Any ideas?
GPML does keep parameters in log domain if they are positive and parameters in standard domain if they can be negative or zero. UPDATED: (unconstrained representation)
if we can break the existing codes, I can move all classes related to GP from standard domain to log domain. (however, users' codes have to be modified)
if not, we have to deal with the following issues
SG_ADD(&m_width, "width", "Kernel width", MS_AVAILABLE, GRADIENT_AVAILABLE);
we implicitly means m_width and its gradient are in the same domain.
we may add a new field in TParameter, which is about the domain (log, standard or else) of a parameter.
With the new field in TParameter, we can do proper transformation.
If you think adding a new field in TParameter is possible, I may need some help from authors' of the class.
Adding a new field in TParameter is not trivial since copy and serialization have to be modified.
CSGObject will be modified too.
for example
if a parameter and its gradient are in different domain, SG_ADD will become
SG_ADD(&m_width, "width", "Kernel width", MS_AVAILABLE, GRADIENT_AVAILABLE, "domain of parameter", "domain of gradient" );
@karlnapf at GradientModelSelection.cpp
219 nlopt_opt opt=nlopt_create(NLOPT_LD_MMA, total_variables);
220
221
222 SGVector<double> lower_bound(total_variables);
223 lower_bound.set_const(1e-6);
224
225
226 SGVector<double> upper_bound(total_variables);
227 upper_bound.set_const(HUGE_VAL);
if we add the domain field in TParameter, we do need a unified representation to set lower_bound and upper_bound used in nlopt
Because now not all parameters (for example, inducing points and FULL_MATRIX weights of the ARD kernel) are positive, we do need unconstrainted or unified (eg, positive) representation to set lower_bound and upper_bound used in nlopt.
More generally, we need three new fields in TParameter, the domain of a parameter, a lower bound of the parameter, a upper bound of the parameter. (assuming parameters are floating point type). Doing so, we can set parameter-specific lower_bound and upper_bound used in nlopt.
I think the TParameteridea is a way to solve this problem.
Please let me know your idea. (is this idea good for you? one field approach or three fields approach? )
One field approach (unconstrainted representation) (for now this approach may be enough)
- positive parameters are in log domain
- other parameters are in standard domain
Three fields approach (parameter-specific representation) (more general, if a parameter has additional constraint)
- add a
domainfield which is the same as the one field approach - add a
lower_boundfield - add a
upper_boundfield
If you agree, I will try to add these field(s).
If I fail to add these field(s), I may need some help from you or other author of TParameter and SGObject. (the tricky part is copy and serialization functions)
- add these new field(s) in
TParameter - some Macros such as new
SG_ADD
UPDATED
@karlnapf Any feedback on the new approach?
I guess we cannot avoid the fact that Shogun developers have to specify the range of parameters when registering them. Therefore, I would go for optional lower and upper bound parameters. We can do those for scalars, vectors, and matrices (the latter two element-wise). For other constraints, such as positive definiteness, I guess we should stay with the current system. This also would be in line with e.h. stan's parameter representation. How would you do domains? With an enum DOMAIN_LOG, DOMAIN_STD, DOMAIN_TANH?
On the other hand, it might be an overkill as Shogun parameters are mostly unbounded or positive, so we just really need those two. Also modifying all the TParameter code is annoying (though not impossible, I could help there)
No matter what we do, we should be explicit and consistent throughout the toolbox. Maybe @lambday also has ideas? @lisitsyn ?
BTW, what about just breaking all GP code and moving the GP parameters all to log-domain. The getters/setters can still be in normal domain though?
@karlnapf
Let's make the parameters in log-domain first.
For kernels, I only pay attention to GaussianKernel and GaussianARDKernel.
okok, I think that should be a refactoring exercise
@karlnapf The following extensions may be useful in model selection.
- Users allow to disable some model parameters searching. (ie, inducing points)
- implement stochastic searching for model parameters (see
Doubly stochastic variational Bayes for non-conjugate inference, ICML 2014)
About the stochastic searching for model parameters
- main idea: model parameters are drawn from a multivariate Gaussian distribution (prior). The optimal model parameters are the mean of the prior. We want to estimate the distribution
- advantage: more robust, modelling correlation among model parameters via the covariance matrix of the prior.
- disadvantage: slower than Type-2 ML (ie, 40 seconds vs 30 minuets)
What are those comments based on ? Just the paper? 40s vs 30m does not really seem too reasonable.
I guess this is MAP inference right?
No. The speed of the method is between MCMC and MAP. It seems like combine MCMC and MAP.
I see, MCMC is slow for sure
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.