shogun make model selection on GP module more flexible

At default, inference methods in GP module will compute gradients wrt model parameters, which is not always necessary.

offer users an option to ask inference methods do not compute these gradients.
Make the model selection framework work with FITC inference (mainly inducing points )
(suspicious bug, some GP parameters are in natural domain while some gradients wrt parameters are in log domain. Domain of a parameter and its gradient should be consistent)

Apr 14 '15 18:04 yorkerlin

@karlnapf will work on this issue once notebooks are done

Apr 14 '15 18:04 yorkerlin

I would actually go to move all GP parameters to log-domain. We can still interface and store them in standard domain, but all dervatives in log-domain would be better for many reasons
inducing point model selection would be good to have I agree
not computing all gradients definitely is a good thing to have.
I would like to add more efficient (GPU, parallel) ways to compute gradients. A unified way to speed up is needed rather than all this ad-hoc code. What are your thoughts here?

Apr 18 '15 10:04 karlnapf

@karlnapf Currently, pthread is used to parallelly compute the gradients.

Apr 21 '15 15:04 yorkerlin

about moving all GP parameters to log-domain:

not all parameters are positive (for GaussianARDKernel, where weights are FULL_Matrix) (adding a gradient transformation layer is not easy since we have to decide whether a gradient is in log domain)
many existing parameters in Shogun are in standard domain. (It will break many codes if we naively change existing positive parameters in Shogun from standard domain to log domain.)

My opinion is

adding a specific gradient transformation layer for each parameter is possible.
However, doing so requires to change internal Shogun classes (CParameterCombination. TParameter). (maybe we need add a new field about parameter transformation in TParameter)
Instead, I prefer to this simple idea--- parameter and its gradient should be in the same domain.

My idea is (for example in GaussianKernel) to add a log flag

CGaussianKernel::CGaussianKernel(int32_t size, float64_t w)

becomes

CGaussianKernel::CGaussianKernel(int32_t size, float64_t w, bool is_log_domain=false)

if flag is_log_doamin is true, w is in log domain and its gradient will be first computed in standard domain (the existing code) and then transformed (post-processing code to be implemented) in log domain.

Apr 21 '15 15:04 yorkerlin

@karlnapf Let me know if you are agree with my idea so that I will implement this idea.

Apr 21 '15 15:04 yorkerlin

@karlnapf the existing model selection (CGradientModelSelection) does not do gradient transformation, which means there is a bug if a parameter and its gradient are in different domains.

Apr 21 '15 15:04 yorkerlin

@karlnapf Any feedback on the model selection

Apr 22 '15 17:04 yorkerlin

I usually prefer general solutions that do things automatically to those that require extra arugments and thinking by the user. Users should not be able to decide whether a parameter is in log-domain or not, but we should do the appropriate thing automatically (in a transparent way)

Not sure what to do though. How do other toolboxes do this? Any ideas?

Apr 22 '15 18:04 karlnapf

GPML does keep parameters in log domain if they are positive and parameters in standard domain if they can be negative or zero. UPDATED: (unconstrained representation)

if we can break the existing codes, I can move all classes related to GP from standard domain to log domain. (however, users' codes have to be modified)

if not, we have to deal with the following issues

    SG_ADD(&m_width, "width", "Kernel width", MS_AVAILABLE, GRADIENT_AVAILABLE);

we implicitly means m_width and its gradient are in the same domain. we may add a new field in TParameter, which is about the domain (log, standard or else) of a parameter. With the new field in TParameter, we can do proper transformation.

Apr 22 '15 19:04 yorkerlin

If you think adding a new field in TParameter is possible, I may need some help from authors' of the class. Adding a new field in TParameter is not trivial since copy and serialization have to be modified.

CSGObject will be modified too.

Apr 22 '15 19:04 yorkerlin

for example if a parameter and its gradient are in different domain, SG_ADD will become

 SG_ADD(&m_width, "width", "Kernel width", MS_AVAILABLE, GRADIENT_AVAILABLE, "domain of parameter", "domain of gradient" );

Apr 22 '15 19:04 yorkerlin

@karlnapf at GradientModelSelection.cpp

  219         nlopt_opt opt=nlopt_create(NLOPT_LD_MMA, total_variables);
  220 
  221       
  222         SGVector<double> lower_bound(total_variables);
  223         lower_bound.set_const(1e-6);
  224 
  225        
  226         SGVector<double> upper_bound(total_variables);
  227         upper_bound.set_const(HUGE_VAL);

if we add the domain field in TParameter, we do need a unified representation to set lower_bound and upper_bound used in nlopt

Because now not all parameters (for example, inducing points and FULL_MATRIX weights of the ARD kernel) are positive, we do need unconstrainted or unified (eg, positive) representation to set lower_bound and upper_bound used in nlopt.

More generally, we need three new fields in TParameter, the domain of a parameter, a lower bound of the parameter, a upper bound of the parameter. (assuming parameters are floating point type). Doing so, we can set parameter-specific lower_bound and upper_bound used in nlopt.

I think the TParameteridea is a way to solve this problem. Please let me know your idea. (is this idea good for you? one field approach or three fields approach? )

One field approach (unconstrainted representation) (for now this approach may be enough)

positive parameters are in log domain
other parameters are in standard domain

Three fields approach (parameter-specific representation) (more general, if a parameter has additional constraint)

add a domain field which is the same as the one field approach
add a lower_bound field
add a upper_bound field

If you agree, I will try to add these field(s). If I fail to add these field(s), I may need some help from you or other author of TParameter and SGObject. (the tricky part is copy and serialization functions)

add these new field(s) in TParameter
some Macros such as new SG_ADD

UPDATED

Apr 22 '15 23:04 yorkerlin

@karlnapf Any feedback on the new approach?

Apr 23 '15 18:04 yorkerlin

I guess we cannot avoid the fact that Shogun developers have to specify the range of parameters when registering them. Therefore, I would go for optional lower and upper bound parameters. We can do those for scalars, vectors, and matrices (the latter two element-wise). For other constraints, such as positive definiteness, I guess we should stay with the current system. This also would be in line with e.h. stan's parameter representation. How would you do domains? With an enum DOMAIN_LOG, DOMAIN_STD, DOMAIN_TANH?

On the other hand, it might be an overkill as Shogun parameters are mostly unbounded or positive, so we just really need those two. Also modifying all the TParameter code is annoying (though not impossible, I could help there)

No matter what we do, we should be explicit and consistent throughout the toolbox. Maybe @lambday also has ideas? @lisitsyn ?

BTW, what about just breaking all GP code and moving the GP parameters all to log-domain. The getters/setters can still be in normal domain though?

Apr 30 '15 13:04 karlnapf

@karlnapf Let's make the parameters in log-domain first. For kernels, I only pay attention to GaussianKernel and GaussianARDKernel.

May 07 '15 00:05 yorkerlin

okok, I think that should be a refactoring exercise

May 07 '15 13:05 karlnapf

@karlnapf The following extensions may be useful in model selection.

Users allow to disable some model parameters searching. (ie, inducing points)
implement stochastic searching for model parameters (see Doubly stochastic variational Bayes for non-conjugate inference, ICML 2014)

About the stochastic searching for model parameters

main idea: model parameters are drawn from a multivariate Gaussian distribution (prior). The optimal model parameters are the mean of the prior. We want to estimate the distribution
advantage: more robust, modelling correlation among model parameters via the covariance matrix of the prior.
disadvantage: slower than Type-2 ML (ie, 40 seconds vs 30 minuets)

May 29 '15 19:05 yorkerlin

What are those comments based on ? Just the paper? 40s vs 30m does not really seem too reasonable.

I guess this is MAP inference right?

Jun 11 '15 08:06 karlnapf

No. The speed of the method is between MCMC and MAP. It seems like combine MCMC and MAP.

Jun 24 '15 23:06 yorkerlin

I see, MCMC is slow for sure

Jul 06 '15 09:07 karlnapf

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Mar 02 '20 14:03 stale[bot]

shogun shogun copied to clipboard

make model selection on GP module more flexible

shogun
shogun copied to clipboard