probability
probability copied to clipboard
Generalized Pareto Supported Below `loc`?
Relatively new to TF/TFP so perhaps there is some obvious answer here. The pdf for the Generalized Pareto is typically defined as having support for x >= loc for shape >= 0 and the docs do even say the support of the distribution is always lower bounded by loc, but this isn't so. I compared with scipy below.
Has impacted me as I'm using a hybrid that utilizes Pareto pdf only if x>=loc but I'm getting NaNs (I think) b/c the non-tail portion is being evaluated by the GenPareto at a number much greater than zero (and its alot more challenging to adjust for).
import numpy as np
import scipy.stats as scist
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow_probability import distributions as tfd
x = np.linspace(-.5,2,1000)
x_tf = tf.convert_to_tensor(x, dtype=tf.float32)
loc, scale, conc = .25, 1, .25
gptf = tfd.GeneralizedPareto(loc, scale, conc)
gp = scist.genpareto(conc, loc=loc, scale=scale)
plt.plot(x_tf, gptf.prob(x_tf), label='TFP')
plt.plot(x_tf, gptf.prob(x_tf))
plt.plot(x, gp.pdf(x), label='SCIPY')
plt.legend()
plt.savefig('gpd.png')

Most TFP distributions will happily compute and return the "natural" formula for (log) prob outside of the support. This is an intentional, if somewhat surprising, choice. There are at least a few reasons:
- doing otherwise would generally entail "control flow" (tf.cond and/or tf.where) which can be pretty expensive in the vectorized computation context for which TFP is optimized. If we hardcode this in, the caller can't avoid paying the cost, even if they've used, say, a
Bijectorto avoid this "bad" region (see (3) below). In general, TFP has a mantra: "only pay for what you eat" -- adding avoidable control flow would contradict this, so we avoid it, at the cost of a slight foot-gun for folks new to the library and its ethos. - someone might like to have, say, a function that computes the Bernoulli log prob formula for non-binary values (similarly for other (log) prob formulas)
- we provide other, more effective means for restricting, say, an optimizer (Bijectors + TransformedVariables!), or chains of an MCMC sampler (Bijectors + TransformedTransitionKernel!) to the support region.
This kind of question has come up before in other contexts, and for some distributions we've added a semi-standardized flag that lets the user say "please give me zero prob or -inf log prob outside the accepted support of this distribution". See, e.g., the force_probs_to_zero_outside_support arg (that's the name we decided was most precise, if not succinct) to the Poisson constructor. (There's also a lingering, deprecated interpolate_nondiscrete flag on Poisson that does the same thing, from before we standardized on the aforementioned awesome, descriptive, un-short name :)).
One interpretation of the statement "The support of the distribution is always lower bounded by loc. " about the support of GeneralizedPareto is that sampling will never yield values outside this support. How a user chooses to call log_prob, for efficiency reasons, is left up to them.
Hope this helps clarify our thinking on this kind of question. We'd welcome a PR adding a similar flag to the GeneralizedPareto distribution that allows users to control whether they want to include the control flow in the log prob computation!
One more note: if you enable the validate_args flag, which is false by default for the same reasons given in (1) above, then (IIRC) most if not all distributions will inject (potentially performance-costly) assertions into methods like log_prob that check that the input is in fact in the support. This won't give you zeros, it'll give you exceptions, but it'll at least let you know if your optimizer/sampler/whatever has found its way out of the support of the distribution.
Generally speaking, if anything about your TFP-based code is not working, it's a good idea to flip validate_args to True everywhere for debugging purposes. It's off by default, again, because we strive to be "fast first" (well, maybe "numerically stable and correct first, fast second", at least aspirationally :))
excellent answer. thank you.
yes, honestly haven't delved into Bijectors yet, so perhaps I'm making my project a bit harder than it needs to be. Wouldn't be the first time ;)
yes, i'll look into a flag for this.
please feel free to raise problem/application-specific questions here or on the mailing list ([email protected]). people are generally eager to jump in with ideas/solutions and we all love thinking about this stuff, however big or small the problem :)