stan icon indicating copy to clipboard operation
stan copied to clipboard

more general gamma function

Open steven-murray opened this issue 10 years ago • 19 comments

I am wanting to use the incomplete gamma function, with a negative shape parameter. This is currently not supported by the relevant boost library function.

To get around this, I have been trying to artificially increase the slope of my data (a lower-truncated generalised gamma distribution) by weighting each data-point by its value (effectively increases the slope by 1). This means that the Gamma call has a positive shape parameter and is therefore allowed, but introduces other problems (cf. the discussion on the mailing list about weighted log-probabilities).

It would be nice to have a more general incomplete gamma function (such a function is implemented for example in GSL, and in mpmath for python).

steven-murray avatar Jul 10 '15 03:07 steven-murray

@steven-murray, can you provide a more concrete description of the incomplete gamma function you're looking for? And, is this for use in the Stan language? Or just in C++.

syclik avatar Jul 10 '15 04:07 syclik

The generalised gamma distribution is defined here. As you can see, all three of its parameters are restricted to be positive. In particular, if $d$ here is negative, the distribution cannot converge towards low $x$.

Usually, the normalising factor, $\Gamma(d/p)$, is implemented only to take positive values of $d/p$ as well (for example the relevant function included in Stan, I'm guessing through the boost library, does this). However, the function can still be defined for negative values.

In my case, my distribution has a negative value of $d$, but this is okay since it is truncated at a lower boundary, which ensures that it can form a statistical pdf. In this case, the normalising factor becomes an incomplete gamma: $\Gamma(d/p,(x_{\rm min}/a)^p)$. This is perfectly valid to contain a positive $d$ parameter, but several libraries do not allow it (for pretty good practical reasons). Without this capability in the Gamma function, I cannot implement my model simply. It would be good to add at least this capability, if not the actual (truncated) generalised gamma distribution, to Stan.

I would like to use it in Stan -- I'm not sure exactly what the relationship between the Stan and C++ code is.

steven-murray avatar Jul 10 '15 04:07 steven-murray

Sorry about the math... apparently there is no support for latex in GFM

steven-murray avatar Jul 10 '15 04:07 steven-murray

@steven-murray We apparently have an incomplete beta hiding in our math lib, and the gradient for the incomplete gamma, so maybe it wouldn't be hard to do:

lib/stan_math_2.7.0/stan/math/prim/scal/fun/grad_reg_inc_gamma.hpp

The relation of C++ to Stan is that our functions are all implemented in C++ under the hood. And we have more people just using our C++ library.

@syclik We now have the issue of managing a feature request like this across multiple repos. It needs to go into the function signatures in repo stan-dev/stan and into the math library in repo stan-dev/math --- any suggestions?

@bgoodri The generalized gamma appears to be related to the kinds of things you were talking about in generalizing parametric families. Is this the kind of thing you'd be interested in having?

bob-carpenter avatar Jul 10 '15 17:07 bob-carpenter

Yes, although we might as well implement one of the generalizations of the generalized gamma distribution.

On Fri, Jul 10, 2015 at 1:20 PM, Bob Carpenter [email protected] wrote:

@steven-murray https://github.com/steven-murray We apparently have an incomplete beta hiding in our math lib, and the gradient for the incomplete gamma, so maybe it wouldn't be hard to do:

lib/stan_math_2.7.0/stan/math/prim/scal/fun/grad_reg_inc_gamma.hpp

The relation of C++ to Stan is that our functions are all implemented in C++ under the hood. And we have more people just using our C++ library.

@syclik https://github.com/syclik We now have the issue of managing a feature request like this across multiple repos. It needs to go into the function signatures in repo stan-dev/stan and into the math library in repo stan-dev/math --- any suggestions?

@bgoodri https://github.com/bgoodri The generalized gamma appears to be related to the kinds of things you were talking about in generalizing parametric families. Is this the kind of thing you'd be interested in having?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1536#issuecomment-120464365.

bgoodri avatar Jul 10 '15 17:07 bgoodri

@bob-carpenter The trick will be if the included incomplete gamma gradient has support for negative d.

steven-murray avatar Jul 13 '15 03:07 steven-murray

There is a workaround to this in some cases -- for small negative d, a recurrence relation can be used. See my own answer to the question at http://stats.stackexchange.com/questions/161134/weighted-log-probabilities-in-generalised-gamma-distribution/162390#162390

Might help?

steven-murray avatar Jul 21 '15 03:07 steven-murray

Perhaps. Do utilize the expose_stan_functions function in rstan 2.7.x (now on CRAN) to verify that your implementation agrees with GSL's via the RcppGSL or gsl packages.

bgoodri avatar Jul 21 '15 03:07 bgoodri

I've done a bit of work using my "fast" function, which only calculates the gamma function, gammainc(a,x), for a>-2. For reference, the function block is:

functions {
    /**
    * gammainc() is the upper incomplete gamma function (not regularized)
    * @param real a, shape parameter, must be >0
    * @param real x, position, >0
    * @returns the non-regularized incomplete gamma
    */
    real gammainc(real a, real x){
          return gamma_q(a,x) * tgamma(a);
    }

    /**
    * gammainc_neg() is the upper incomplete gamma function for a > -2
    * @param real a, shape parameter, must be >-2
    * @param real x, position, >0
    * @returns the non-regularized incomplete gamma
    *
    * NOTES: this routine has been written for speed for this application.
    *        More general functions for negative a can be gotten by recursion
    *        formulae etc.
    */
    real gammainc_neg(real a, real x){
        return ((gammainc(a+2,x)-pow(x,(a+1))*exp(-x))/(a+1)-exp(-x)*pow(x,a))/a;
    }
}

Comparing to GSL's version, I get the following two plots, which I hope will be fairly self-explanatory. I choose to run over values of a and x independently, for what are (in my situation) fairly normal values. All differences are better than 1e-5, with most better than 1e-12. There are a few features where perhaps the GSL version switches methods (not really sure).

with_a with_x

I can test a more general version (for lower a) later.

steven-murray avatar Jul 31 '15 02:07 steven-murray

Neat. Are you going to eventually need the log of that? If so, you can probably make the functions much more stable by working directly on the log scale.

Also, it'd be more efficient to rewrite the second function as follows to avoid recomputation:

real gammainc_neg(real a, real x){ real e_neg_x; real neg_pow_x_a; e_neg_x <- exp(-x); neg_pow_x_a <- -pow(x, a);

return (gammainc(a + 2, x) + x * neg_pow_x_a * exp_neg_x) / (a + 1)
       + e_neg_x * neg_pow_x_a / a;

}

Basically, anything that cuts down on redundant calculations or replaces division with multiplication or negation with addition is a win. Of course, you should check my algebra. But it looks like you're well down the testing path (which is awesome --- thanks for the plots).

Boost has an incomplete gamma function implementation:

http://www.boost.org/doc/libs/1_58_0/libs/math/doc/html/math_toolkit/sf_gamma/igamma.html

The doc seems to be broken, but maybe it's my browser. But it does indicate different approximations are used for different parts of the domain.

Also, is there an easy way to compute derivatives? We could just chain rule it all through gamma_q and tgamma (or on the log scale if that's the real target).

  • Bob

On Jul 30, 2015, at 10:29 PM, Steven Murray [email protected] wrote:

I've done a bit of work using my "fast" function, which only calculates the gamma function, gammainc(a,x), for a>-2. For reference, the function block is:

functions { /** * gammainc() is the upper incomplete gamma function (not regularized) * @param real a, shape parameter, must be >0 * @param real x, position, >0 * @returns the non-regularized incomplete gamma */ real gammainc(real a, real x){ return gamma_q(a,x) * tgamma(a); }

/**
* gammainc_neg() is the upper incomplete gamma function for a > -2
* @param real a, shape parameter, must be >-2
* @param real x, position, >0
* @returns the non-regularized incomplete gamma
*
* NOTES: this routine has been written for speed for this application.
*        More general functions for negative a can be gotten by recursion
*        formulae etc.
*/
real gammainc_neg(real a, real x){
    return ((gammainc(a+2,x)-pow(x,(a+1))*exp(-x))/(a+1)-exp(-x)*pow(x,a))/a;
}

Comparing to GSL's version, I get the following two plots, which I hope will be fairly self-explanatory. I choose to run over values of a and x independently, for what are (in my situation) fairly normal values. All differences are better than 1e-5, with most better than 1e-12. There are a few features where perhaps the GSL version switches methods (not really sure).

I can test a more general version (for lower a) later.

— Reply to this email directly or view it on GitHub.

bob-carpenter avatar Jul 31 '15 03:07 bob-carpenter

Okay, so I've re-implemented in a more general manner, so that it works for any negative a. I also took your suggestions into account, so here is the code:

    real gammainc(real a, real x){
          return gamma_q(a,x) * tgamma(a);
    }

    real gammainc_neg_gen(real a, real x){
      int n;
      real ap1;
      real ssum;

      if(a>=0) return gammainc(a,x);

      ap1 <- a+1;

      //Get floor(-a)
      n<-0;
      while(n<-a){
        n <- n+1;
      }

      //Get summed part
      {
        vector[n] sums;
        for(i in 0:n-1) sums[i+1] <- pow(x,i)/tgamma(ap1+i);
        ssum <- sum(sums);
      }
      return tgamma(a)*(gamma_q(a+n,x)-pow(x,a)*exp(-x)*ssum);
    }

I'm not sure this is as fast as it can be -- I had to use a while loop to get an integer ceil(-a). Any suggestions there would be good.

In terms of accuracy, cf. the following two plots (similar to the other ones, but a much larger range of a): with_a with_x

The only real problem seems to be with large negative a and large x. Of course, because of the sum, the efficiency also drops for larger negative a.

For my application I do need the log of this. However, I can't see any real way to simplify things with logs, because it's a sum. Any ideas here would be good as well.

The derivative can be found, but it's in terms of digamma's and Meijer-G functions (how do you even do the derivative of say gamma_q in Stan, since you don't have the Meijer-G function?).

steven-murray avatar Jul 31 '15 10:07 steven-murray

I'm afraid there's no god way to do integer ceilings. We've debated it in the past, but haven't added it as a built-in because it breaks differentiability of models in general (though not in every use case, hence the debate).

I think it might be possible to write a binary search that would cut the expected number of operations for the floor down to log(n).

On the log scale, sums on the linear scale turn into log_sum_exp operations.

log(a * b) = log(a) + log(b)

log(a + b) = log_sum_exp(log(a), log(b))

It can be much more arithmetically stable and also provide better precision in some cases.

  • Bob

On Jul 31, 2015, at 6:32 AM, Steven Murray [email protected] wrote:

Okay, so I've re-implemented in a more general manner, so that it works for any negative a. I also took your suggestions into account, so here is the code:

real gammainc(real a, real x){
      return gamma_q(a,x) * tgamma(a);
}

real gammainc_neg_gen(real a, real x){
  int n;
  real ap1;
  real ssum;

  if(a>=0) return gammainc(a,x);

  ap1 <- a+1;

  //Get floor(-a)
  n<-0;
  while(n<-a){
    n <- n+1;
  }

  //Get summed part
  {
    vector[n] sums;
    for(i in 0:n-1) sums[i+1] <- pow(x,i)/tgamma(ap1+i);
    ssum <- sum(sums);
  }
  return tgamma(a)*(gamma_q(a+n,x)-pow(x,a)*exp(-x)*ssum);
}

I'm not sure this is as fast as it can be -- I had to use a while loop to get an integer ceil(-a). Any suggestions there would be good.

In terms of accuracy, cf. the following two plots (similar to the other ones, but a much larger range of a):

The only real problem seems to be with large negative a and large x. Of course, because of the sum, the efficiency also drops for larger negative a.

For my application I do need the log of this. However, I can't see any real way to simplify things with logs, because it's a sum. Any ideas here would be good as well.

The derivative can be found, but it's in terms of digamma's and Meijer-G functions (how do you even do the derivative of say gamma_q in Stan, since you don't have the Meijer-G function?).

— Reply to this email directly or view it on GitHub.

bob-carpenter avatar Jul 31 '15 20:07 bob-carpenter

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

The other thing I thought we could do is pre-expand, so it is Gamma(a,x) - x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

steven-murray avatar Aug 01 '15 02:08 steven-murray

On Jul 31, 2015, at 10:25 PM, Steven Murray [email protected] wrote:

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

No problem. If it weren't for time constraints, Stan would have a lot more built-ins!

The thing most high-performance search and sorting algorithms do is use different algorithms depending on size.

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

Conditionals, as in "if (cond) block1 else block2" are always better than if_else in terms of performance. The problem with if_else() is that it evaluates both branches eagerly. What we really need is an equivalent of the ternary operator in C (C++, Java, etc.)

The other thing I thought we could do is pre-expand, so it is Gamma(a,x) - x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

No need to optimize if what you have works. Always always get things working, then optimize if you need more speed. We've been doing this in Stan all along. We started just autodiffing through the matrix libraries, Boost functions, ODE solver, etc. And now we've been moving more and more to custom derivatives.

  • Bob

bob-carpenter avatar Aug 01 '15 19:08 bob-carpenter

Cool, I'll keep it as-is for now. By the way, is there any way for the user to specify derivatives/jacobians? I would have thought that if there was a way, it would open things up a bit, and possibly make some code faster.

On Sun, Aug 2, 2015 at 3:28 AM, Bob Carpenter [email protected] wrote:

On Jul 31, 2015, at 10:25 PM, Steven Murray [email protected] wrote:

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

No problem. If it weren't for time constraints, Stan would have a lot more built-ins!

The thing most high-performance search and sorting algorithms do is use different algorithms depending on size.

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

Conditionals, as in "if (cond) block1 else block2" are always better than if_else in terms of performance. The problem with if_else() is that it evaluates both branches eagerly. What we really need is an equivalent of the ternary operator in C (C++, Java, etc.)

The other thing I thought we could do is pre-expand, so it is Gamma(a,x)

  • x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

No need to optimize if what you have works. Always always get things working, then optimize if you need more speed. We've been doing this in Stan all along. We started just autodiffing through the matrix libraries, Boost functions, ODE solver, etc. And now we've been moving more and more to custom derivatives.

  • Bob

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1536#issuecomment-126947886.

steven-murray avatar Aug 05 '15 02:08 steven-murray

On Aug 4, 2015, at 10:54 PM, Steven Murray [email protected] wrote:

Cool, I'll keep it as-is for now. By the way, is there any way for the user to specify derivatives/jacobians? I would have thought that if there was a way, it would open things up a bit, and possibly make some code faster.

It's on our long-term to-do list, but as of now, there's no good way to do it. In fact, we don't even have a syntax for it developed. The main issue is that in user-defined functions, the code is generated so that any of the variables can be parameters or data, and all the derivatives work out. It's fairly tricky to do this efficiently with user-specified derivatives without making users write 2^N derivatives for N-input functions. We've been thinking about various kinds of functors that would do the right thing one dimension at a time, but then that blocks code sharing --- often derivatives involve repeated subexpressions.

We're happy to entertain syntax suggestions!

We'd also need to go up to second- and third-order derivatives, or we'd have to be able to differentiate through user-supplied derivatives. We can probably figure out how to do the templating for that along the same lines as for the existing functions.

  • Bob

On Sun, Aug 2, 2015 at 3:28 AM, Bob Carpenter [email protected] wrote:

On Jul 31, 2015, at 10:25 PM, Steven Murray [email protected] wrote:

Just had a quick read-up of binary searches on Wikipedia and I think we'd have to do a one-sided binary search starting at 0, which has performance of 2*log2(n). So only for n>4 will it start cutting down on iterations (but will have more overhead I would assume). That's probably a good idea when n is getting up to close to 100, but in my use-case this won't happen, and I don't have the time right at this moment to implement stuff I don't need (PhD due in 8 weeks...).

No problem. If it weren't for time constraints, Stan would have a lot more built-ins!

The thing most high-performance search and sorting algorithms do is use different algorithms depending on size.

As for the log_diff_exp() function -- unfortunately there's a subtlety that causes this to be difficult. There are two factors: gamma(a) and the (Q(a,x) - sumStuff). However, both of these parts oscillate in whether they are positive or negative based on the value of n (even values are positive if I recall). The lgamma() function automatically deals with this, but the other part gets assigned NaN whenever it is negative. I can of course check if it is negative and rearrange it etc., but then I have to use if_else, which makes the program slower. Probably just quicker and easier to take the log of the multiplied product, unless you have a brilliant idea?

Conditionals, as in "if (cond) block1 else block2" are always better than if_else in terms of performance. The problem with if_else() is that it evaluates both branches eagerly. What we really need is an equivalent of the ternary operator in C (C++, Java, etc.)

The other thing I thought we could do is pre-expand, so it is Gamma(a,x)

  • x^a exp(-x)* Sum(x^k /rising_factorial(a+k+1)). This will always be overall positive, so the log_diff_exp will work every time. However, the rising_factorial() is only defined for positive a.

For now, I'm going to stick with just taking the log of the real-scale version.

No need to optimize if what you have works. Always always get things working, then optimize if you need more speed. We've been doing this in Stan all along. We started just autodiffing through the matrix libraries, Boost functions, ODE solver, etc. And now we've been moving more and more to custom derivatives.

  • Bob

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1536#issuecomment-126947886.

— Reply to this email directly or view it on GitHub.

bob-carpenter avatar Aug 05 '15 05:08 bob-carpenter

I found the generalised gamma function in BUGS

https://books.google.com.au/books?id=zG7SBQAAQBAJ&pg=PA347&lpg=PA347&dq=%22dggamma%22&source=bl&ots=QvmDksRRQ7&sig=p7Uwr6V7kf3lCy4ncwPM3it3d3k&hl=en&sa=X&ved=0ahUKEwjl-4SMtuvTAhWLfrwKHevKDfkQ6AEIODAE#v=onepage&q=%22dggamma%22&f=false

However is i try to implement in stan

functions{ real ggamma_lpdf(real x, real a, real b, real c) { return(log(cb^(ca) * x^(ca - 1) * exp(-(bx)^c) / tgamma(a))); } } or

functions{ real ggamma_lpdf(real x, real a, real b, real c) { return( (ca * (log (c) + log(b)) ) + ((ca - 1) * log (x) ) + (-(b*x)^c ) - log( tgamma(a) )); } }

I have no success in test prediction

stemangiola avatar May 12 '17 23:05 stemangiola

@stemangiola This sounds like a question for the mailing list (http://discourse.mc-stan.org/), or a separate issue. If you think there is a bug please try to post a reproducible example. If you want a feature please start a separate issue. The link you give won't open for me but I have the generalized gamma implemented in a package if that's what you need. There are a lot of varieties of it and some work better than others in Stan.

sakrejda avatar May 13 '17 00:05 sakrejda

Thanks a lot,

sorry for the post here.

Here is my question for the community.

  • https://groups.google.com/forum/#!category-topic/stan-users/general/AIv6Ule-Glk
  • http://discourse.mc-stan.org/t/gamma-2-generalized-gamma/517

Your implementations for stan as function{} would be much appreciated!

stemangiola avatar May 13 '17 00:05 stemangiola