math icon indicating copy to clipboard operation
math copied to clipboard

Add distribution quantile functions (inverse CDFs)

Open andrjohns opened this issue 4 years ago • 3 comments

Description

I'm opening this as a reference issue for implementing the distribution quantile functions, since there are likely to be quite a few PRs along the way.

I'll be starting with the distributions in the boost documentation

Points for discussion

  • Naming: I'm proposing we use the _icdf suffix, but let me know if there any objections to that

Let me know if there are any suggestions or any issues I'm likely to run into

Current Version:

v4.1.0

andrjohns avatar Jun 26 '21 10:06 andrjohns

@spinkney I was looking at the quantile function for the negative-binomial, and it appears to require the inverse of the incomplete beta function, but when the function is inverted on the b parameter (the ibeta_invb function).

The tricky part is that this doesn't appear to have a closed-form, and is instead computed numerically through a root-finding algorithm, so I haven't been able to find any kind of derivatives for the inputs. At the moment it looks like we'll have to resort to using finite-differencing in Boost, but this is obviously not ideal. Any chance you know of an alternative for the negative-binomial (or gradients for the ibeta_invb)?

andrjohns avatar Jan 07 '22 04:01 andrjohns

Quantile functions for discrete distributions are tricky. See https://www.boost.org/doc/libs/1_78_0/libs/math/doc/html/math_toolkit/pol_tutorial/understand_dis_quant.html.

The way Stan currently handles cdfs for discrete distributions is that the input must be an integer type. That means we cannot have a continuous quantile function. So really no derivatives.

There is a discussion in a math issue about allowing real inputs in discrete cdfs. The problem is that the real values between each integer are interpolated and there are an infinite number of ways to do that. The current design choice is to not build it in because we would effectively choose the interpolation and we'd be hiding the fact that it's truly not continuous from the definition.

If we do include the quantile functions for discrete distributions this opens up an interesting question about how we classify a noncontinuous function with real outputs. Or do we interpolate, in which case, we should allow reals into the discrete cdfs as well.

spinkney avatar Jan 07 '22 09:01 spinkney

Ahh of course, forgot about all of that. Will just stick with the continuous distributions!

andrjohns avatar Jan 07 '22 09:01 andrjohns