PEtab icon indicating copy to clipboard operation
PEtab copied to clipboard

uniform vs. logUniform prior

Open JanHasenauer opened this issue 5 years ago • 30 comments

so far we only allow for uniform, however, we should in my opinion distinguish between uniform and logUniform. in most cases we use actually log-uniform as we consider a uniform distribution on log scale.

JanHasenauer avatar Mar 22 '20 00:03 JanHasenauer

Shouldn’t this already be addressed by parameter scale?

FFroehlich avatar Mar 22 '20 00:03 FFroehlich

Agreed, parameterScaleUniform should do that, as described in the docs.

yannikschaelte avatar Mar 22 '20 00:03 yannikschaelte

I would prefer to separate this as for normal and log normal. The prior should always be defined in the scale of the original parameter. parameterScale should only say what's used for optimization, sampling, etc.

JanHasenauer avatar Mar 22 '20 00:03 JanHasenauer

I did not see parameterScaleUniform. let me check

JanHasenauer avatar Mar 22 '20 00:03 JanHasenauer

If we always stick to definition on the original scale, we can in my understanding get rid of parameterScaleUniform, parameterScaleNormal and parameterScaleLaplace.

Furthermore, it appears as if it would make things easier to understand. at least for me.

JanHasenauer avatar Mar 22 '20 00:03 JanHasenauer

What would currently happen if i would set prior logNormal and parameterScaleNormal?

JanHasenauer avatar Mar 22 '20 00:03 JanHasenauer

here is the original discussion: https://github.com/PEtab-dev/PEtab/issues/17. here is what is explicitly used: https://github.com/PEtab-dev/PEtab/blob/master/petab/sampling.py#L53

so the agreement was to always specify on linear scale, except the "parameterScale" ones allow to define directly on scale, e.g. loc-scale.

the difference is that "normal" gives a normal densitiy on the linear parameters, and the resulting values are then scaled, "logNormal" gives an exponentiated normal distribution on the linear parameters. "parameterScaleNormal" always returns a normal distribution, and does not scale (in contrast to the other 2) the samples afterwards.

yannikschaelte avatar Mar 22 '20 09:03 yannikschaelte

So ... from my perspective they would mean slightly different things, in particular what we consider the base space for sampling. Anyone any opinions here? @paulstapor @jvanhoefer you implemented this in the first place?

yannikschaelte avatar Mar 22 '20 09:03 yannikschaelte

to be honest, i can't follow the argument and the concept how the two scales information should be used. for me it would for instance be still unclear what happens if I set prior logNormal and parameterScaleNormal.

I would be in favour of keeping it simple and have the prior always defined for the untransformed parameter. In this case we would simple have a list of prior and corresponding parameters, but nothing else. This would in my opinion also be consistent with the current handling of bounds and nominal values, which always have to be defined for the untransformed parameter.

If we handele it like this, the parameter scale is essentially a setting for the inference algorithm but not for the problem specification. This separation is in my opinion useful.

JanHasenauer avatar Mar 22 '20 10:03 JanHasenauer

The idea was that "parameterScale_Uniform/Normal/Laplace" should consider the parameter scale which was used. This was not really crucial for parameterScaleNormal or parameterScaleLaplace. But I considered it important for paramterScaleUniform.

Here's the background: We used to sample starting points uniformly in log10, our parameter scale. And in the wide majority of cases (I think), we will want to sample uniform in the scale of the parameters (as e.g. PESTO did). However, due to our convention of denoting everything now in linear scale, we needed an option in PEtab, which encoded this type of initial point sampling. This is what "parameterScaleUniform" does. I considered "parameterScaleUniform" to be the best naming for this... But, if there are better ideas, we can change taht.

paulstapor avatar Mar 22 '20 12:03 paulstapor

So ... from my perspective they would mean slightly different things, in particular what we consider the base space for sampling. Anyone any opinions here? @paulstapor @jvanhoefer you implemented this in the first place?

Yes, I implemented that after a Skype with @jvanhoefer. In some cases, it might be interesting to quickly change the parameterScale and see, how this influences optimization. And parameterScaleXYZ is the exact equivalent to sampling by XYZ in Pesto back then... As said: We may also use linearXYZ, logXYZ, log10XYZ, and logicleXYZ ...

paulstapor avatar Mar 22 '20 12:03 paulstapor

I am not so familiar with the technical aspects in Pesto, so I do not see the technical implications...

I am fine with the current solution, but I would also be fine with what @JanHasenauer proposed.

jvanhoefer avatar Mar 22 '20 13:03 jvanhoefer

I see that it is important to have the possibility to specify the sampling of starting points for optimisation independent of the prior. however, this should than not be in the section on priors.

I though that we agreed in the past to also allow the specification of a sampling distribution. This would keep the two things separate. By default the prior distribution should also be used for the sampling.

In the special case of uniform and log-uniform: I would simply allow to not specific a prior. In this case it would implicitly be (an unscaled) uniform or log-uniform distribution depending on the parameter transformation.

JanHasenauer avatar Mar 22 '20 13:03 JanHasenauer

I see that it is important to have the possibility to specify the sampling of starting points for optimisation independent of the prior. however, this should than not be in the section on priors.

I though that we agreed in the past to also allow the specification of a sampling distribution. This would keep the two things separate. By default the prior distribution should also be used for the sampling.

My overall aim was actually only, that the default in PEtab/pyPESTO does the same, as the default in PESTO did. This motivates the whole thing. But I agree, that it might be clearer to delete "parameterScaleXYZ" and rather leave it to the user to make the scale of the parameter of the prior and the parameter consistent...

In the special case of uniform and log-uniform: I would simply allow to not specific a prior. In this case it would implicitly be (an unscaled) uniform or log-uniform distribution depending on the parameter transformation.

That is the default anyway. If you don't specify anything, parameterScaleUniform is used, which samples the parameters uniformly in their scales and adds nothing on the gradient...

paulstapor avatar Mar 22 '20 13:03 paulstapor

I would prefer to keep uniform and logUniform. It might happen -- in rare cases -- that somebody wants to use a uniform prior but fit on log-scale. Furthermore, priors are normalised, meaning that at least formally in the uniform and log-uniform case we have to divide by the integral. In the case of model selection this will matter. For this reason I would keep both allow for not specifying a prior (with is not the same as a uniform or logUniform prior).

JanHasenauer avatar Mar 22 '20 15:03 JanHasenauer

Overall, we have to decided whether we should not completely separate problem specification (to which the prior belongs) and fitting specification (to which parameter transformations would belong. For the moment I would keep it in the same file.

JanHasenauer avatar Mar 22 '20 15:03 JanHasenauer

I lost track here. What's the conclusion?

dweindl avatar Mar 03 '21 10:03 dweindl

Is log-uniform supported by now? If not, this is still open.

In a recent discussion on mattermost, we concluded as far as I know again that the prior should be defined in the original scale and that the parameter scale is only something used by the algorithms.

JanHasenauer avatar Mar 03 '21 11:03 JanHasenauer

The question that remained open in the mm discussion was whether (pdf2) or not (pdf1) one has to perform coordinate transformation when the prior is defined in lin-space, but optimization or sampling takes place in log-space. Applying it may induce a move of the optimal point, however not applying it does not correctly reflect the density the log-scaled parameters are sampled from (in fact pdf1 is not a density at all afaik).

def rvs():
    """Sample from prior, log-scale."""
    return np.log(np.random.uniform(a,b))

def pdf1(x):
    """Density at log-scaled value `x`."""
    return sp.stats.uniform(a,b).pdf(np.exp(x))

def pdf2(x):
    """Density at log-scaled value `x`."""
    # factor from x = log(y) --> dx/dy = ...
    return np.exp(x) * sp.stats.uniform(a,b).pdf(np.exp(x))

Afaik, e.g. pypesto currently uses pdf1 which leaves optimal points invariant (sampling not really much done yet), whereas pyabc refuses to do so as it as not a density, and performs sampling on linear scale if in doubt, which is in any case correct, but not ideal if sampling on log-scale would be very preferable.

yannikschaelte avatar Mar 03 '21 17:03 yannikschaelte

Is log-uniform supported by now? If not, this is still open.

As far as I can tell log-uniform is not supported yet.

The question that remained open in the mm discussion was whether (pdf2) or not (pdf1) one has to perform coordinate transformation when the prior is defined in lin-space, but optimization or sampling takes place in log-space. Applying it may induce a move of the optimal point, however not applying it does not correctly reflect the density the log-scaled parameters are sampled from (in fact pdf1 is not a density at all afaik).

def rvs():
    """Sample from prior, log-scale."""
    return np.log(np.random.uniform(a,b))

def pdf1(x):
    """Density at log-scaled value `x`."""
    return sp.stats.uniform(a,b).pdf(np.exp(x))

def pdf2(x):
    """Density at log-scaled value `x`."""
    # factor from x = log(y) --> dx/dy = ...
    return np.exp(x) * sp.stats.uniform(a,b).pdf(np.exp(x))

Afaik, e.g. pypesto currently uses pdf1 which leaves optimal points invariant (sampling not really much done yet), whereas pyabc refuses to do so as it as not a density, and performs sampling on linear scale if in doubt, which is in any case correct, but not ideal if sampling on log-scale would be very preferable.

I think this is a conceptual issue of sampling. While it makes sense to define a scale in which you perform optimization, because it's scale invariant, it doesn't really make too much sense to define a scale on which to "perform" sampling, since results are not invariant with respect to parameter scale. I would argue that the correct behavior is to refuse to sample problems where prior scale and parameter scale do not agree. For sampling, it also doesn't make any sense to compare lin/log runs because results are not comparable anyways, so I don't see any immediate need to enable it and the parameterScale priors are probably the best bet in that case anyways.

FFroehlich avatar Mar 03 '21 20:03 FFroehlich

@FFroehlich completely agreeing here. In fact, pyABC warns about "prior scale" != "parameter scale" too, that one should be aware of this fact.

Yes, log-uniform (in the sense of a log-uniform prior, i.e. e^U, analogously to log-normal, log-laplace) is not supported yet.

yannikschaelte avatar Mar 03 '21 22:03 yannikschaelte

is the "prior scale" at al necessary if wee just have different distributions?

And regarding the sampling: For me specifying "log-scale" for the sampler would mean that an MCMC method users a "log-normal proposal" instead of a "normal proposal". I know that it s not implemented like this, but if it would be, I think the comparison of the sampling results would be meaningful. ... or dod I miss something?

If I do not miss something, we should simply adapt the samplers in pyPesto.

JanHasenauer avatar Mar 03 '21 22:03 JanHasenauer

is the "prior scale" at al necessary if wee just have different distributions?

I would argue that, by definition, all non-parameterScale priors have linear scale. We do not define a prior scale per se.

And regarding the sampling: For me specifying "log-scale" for the sampler would mean that an MCMC method users a "log-normal proposal" instead of a "normal proposal".

Thats what you get when you use parameterScaleNormal. The primary concern are other distributions in combination with log-transformed parameters.

I know that it s not implemented like this, but if it would be, I think the comparison of the sampling results would be meaningful. ... or dod I miss something?

If I do not miss something, we should simply adapt the samplers in pyPesto.

FFroehlich avatar Mar 03 '21 23:03 FFroehlich

Sorry, the answer is not really clear to me.

And regarding the sampling: For me specifying "log-scale" for the sampler would mean that an MCMC method users a "log-normal proposal" instead of a "normal proposal".

Thats what you get when you use parameterScaleNormal. The primary concern are other distributions in combination with log-transformed parameters.

I'm not sure, but i have the feeling that we are taking about different things. I was talking about the proposal distribution in an MCMC algorithm, hence, the generation of candidates. The reply seems to be rather concerned with priors. Proposal distributions in MCMC are mostly normal distributions.

is the "prior scale" at al necessary if wee just have different distributions?

I would argue that, by definition, all non-parameterScale priors have linear scale. We do not define a prior scale per se.

I would even go one step further and say that priors are always in linear scale. This would be much clearer and avoid confusing. Following this one of thought, we should eliminate parameterScale.

In that regard, it might be also be good to rename parameterScale. Maybe using terms such as optimizationScale or samplingScale would make it more transparent what's happening. Or more general 'inferenceScale'.

JanHasenauer avatar Mar 03 '21 23:03 JanHasenauer

And regarding the sampling: For me specifying "log-scale" for the sampler would mean that an MCMC method users a "log-normal proposal" instead of a "normal proposal".

Thats what you get when you use parameterScaleNormal. The primary concern are other distributions in combination with log-transformed parameters.

Normal proposal + log-scale would mean exp(var) is normal, with var the log-scaled variables, as opposed to a log-normal distribution, where log(var) is normal, so it is kind of the opposite of a log-normal distribution ... On the other hand, parameterScaleNormal means normal in log-space, thus the linear parameters are log-normal (as Fabi said).

I'm not sure, but i have the feeling that we are taking about different things. I was talking about the proposal distribution in an MCMC algorithm, hence, the generation of candidates. The reply seems to be rather concerned with priors. Proposal distributions in MCMC are mostly normal distributions.

Not sure if I can follow here ... Is the question whether the sampler "sees" all parameters in log- or lin-space? If it sees all parameters in log-space, then yes, a normal proposal distribution there would implicitly mean a log-normal distribution on the original parameters. However, the point for me was that if sampling is in log-space, then e.g. for the Metropolis update, prior(x) / proposal(x) must be evaluated, and for this to be proper, prior(x) should be an actual density in log-space.

yannikschaelte avatar Mar 04 '21 07:03 yannikschaelte

I think I understand the issue a bit better now, sorry I missed the proposal distribution part earlier on. I think changing the scale of the proposal density is the right interpretation of what the parameter scale should influence for sampling algorithms, whatever that may look like.

Overall I think writing up a concise document of scenarios we want to cover and carefully thinking about how the current specifications fits into that would be helpful to refine the current specification and guide future users.

FFroehlich avatar Mar 04 '21 23:03 FFroehlich

One point with changing the scale of the proposal density: If we have e.g. a lognormal prior 1 / sqrt(2*pi*sigma**2*x**2) * exp(- (log x - mu)**2 / (2*sigma**2)) with mode x=exp(mu - sigma**2), and this would be transformed in log-space into a normal prior 1 / sqrt(2*pi*sigma**2) * exp(-(z-mu)**2 / (2*sigma**2)) with mode z=mu. I.e. a variable transform would also affect the MAP estimate, hence optimization, right?

yannikschaelte avatar Mar 05 '21 06:03 yannikschaelte

Overall I think writing up a concise document of scenarios we want to cover and carefully thinking about how the current specifications fits into that would be helpful to refine the current specification and guide future users.

:+1: :+1:

dweindl avatar Mar 05 '21 08:03 dweindl

Basically it amounts to this discussion: http://theoryandpractice.org/stats-ds-book/distributions/invariance-of-likelihood-to-reparameterizaton.html

So the (frequentist) likelihood is invariant under parameter transformation, but the (Bayesian) posterior is not. I would read this so that in a Bayesian setting, the parameter scale is either 1) just used in optimization or frequentist stuff, and sampling is performed in lin-space, and all (prior) densities are evaluated in lin-space, or 2) considered part of the problem, and then sampling and optimization happen in (e.g.) log-space and the densities are always transformed according to the Jacobian.

yannikschaelte avatar Mar 05 '21 22:03 yannikschaelte

How to proceed here?

yannikschaelte avatar Mar 14 '21 14:03 yannikschaelte