probability icon indicating copy to clipboard operation
probability copied to clipboard

Implement quantile for `Mixture` and `MixtureSameFamily`.

Open moonman925 opened this issue 5 years ago • 14 comments

when will the inverse cdf of these kinda distribution quantile be implemented? I have nightly version installed but found quantile not implemented.

moonman925 avatar Nov 28 '19 06:11 moonman925

Hi, For mixture distributions, there generally isn't a closed form for quantiles of mixtures. You could use something like Newton's method with the CDF function to have an approximation to a quantile function. Given a mixture of distributions with known CDFs, the mixture CDF is a weighted sum of the CDF of the component distributions (which I believe the mixture distributions already implement).

Feel free to reopen this if you want to discuss more.

srvasude avatar Feb 08 '20 02:02 srvasude

Actually MixtureSameFamily doesn't implement quantile. We could do that the way you suggest. PRs welcome.

brianwa84 avatar Feb 08 '20 03:02 brianwa84

@brianwa84 Yep that's what I was mentioning :P. Will change title of this issue to reflect that.

srvasude avatar Feb 08 '20 03:02 srvasude

Oh never mind, was thinking backward. Cdf you can complete with weighted sum, and is already implemented. Not quantile.

brianwa84 avatar Feb 08 '20 03:02 brianwa84

For the fellow strugglers -- just use pynverse. I have a model which predicts a mixture of gaussians with 4 components. Here is how I get quantiles for one observation.

from pynverse import inversefunc
from tensorflow_probability import distributions as tfd
mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=alpha_pred[0, :]),
        components_distribution=tfd.Normal(
            loc=mu_pred[0, :],
            scale=sigma_pred[0, :]))
cdf_fit = lambda x: mixture_distribution.cdf(x).numpy()
%timeit quantile_func_fit = inversefunc(cdf_fit, domain=[y_train.min(),y_train.max()], accuracy=6)

7.79 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

quantile_func_fit([0.5, 0.9, 0.95, 0.99])
>>> array([15.26007136, 37.84270684, 39.90694571, 42.79034905])
cdf_fit([15.26007136, 37.84270684, 39.90694571, 42.79034905])
>>> array([0.5       , 0.90000004, 0.95      , 0.99      ], dtype=float32)

Good enough for any practical application.


This is one-core computation, using bare-bones numpy. I'm sure a proper tensorflow implementation would be even faster. @brianwa84 why is this issue closed?..

Demetrio92 avatar Jun 21 '20 21:06 Demetrio92

In the meantime I hacked together a rather efficient quantile approximation. USE WITH CAUTION! READ THE CODE

import numpy as np

def mixture_distribution_quantiles(mixture_distribution, probs, N_grid_points=int(1e3)):
    # base_grid = get_base_grid(mixture_distribution)
    # TODO: need a smarter grid (or you can customize it to your dataset)
    base_grid = np.linspace(-5, 20, num=N_grid_points)  # hacked in order to fit the simulation below
    full_grid = np.transpose(np.tile(base_grid, (mixture_distribution.batch_shape[0], 1)))
    cdf_grid = mixture_distribution.cdf(full_grid).numpy()  # this is fully parallelized and even uses GPU
    grid_check = (cdf_grid.min(axis=0).max() <= min(probs)) & (max(probs) <= cdf_grid.max(axis=0).min())
    if not grid_check:
        raise RuntimeError('Grid does not span full CDF range needed for interpolation!')

    probs_row_grid = np.transpose(np.tile(probs, (cdf_grid.shape[0], 1)))
    def get_quantiles_for_one_observation(cdf_grid_one_obs):
        return base_grid[np.argmax(np.greater(cdf_grid_one_obs, probs_row_grid), axis=1)]

    # TODO: this is the main performance bottleneck. uses only one CPU core
    quantiles_grid = np.apply_along_axis(
        func1d=get_quantiles_for_one_observation,
        axis=0,
        arr=cdf_grid,
    )
    return quantiles_grid


if __name__ == "__main__":
    from tensorflow_probability import distributions as tfd
    quantiles_to_compute = np.array([0.005, 0.1, 0.25, 0.5, 0.75, 0.9, 0.995])
    N_predictions = int(4*1e3)
    N_components = 5
    N_grid_points = int(1e4)
    pi_sim = np.random.uniform(size=(N_predictions, N_components)).astype(np.float32)
    pi_sim = pi_sim/np.transpose(np.tile(pi_sim.sum(axis=1), (N_components, 1)))
    mean_sim = np.random.uniform(low=5, high=10, size=(N_predictions, N_components)).astype(np.float32)
    sd_sim = np.random.uniform(low=0.1, high=1, size=(N_predictions, N_components)).astype(np.float32)
    mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Normal(
            loc=mean_sim,
            scale=sd_sim,
            validate_args=True,
            allow_nan_stats=False,
        )
    )

    quantiles_for_each_prediction = mixture_distribution_quantiles(
        mixture_distribution=mixture_distribution,
        probs=quantiles_to_compute,
        N_grid_points=N_grid_points,
    )
    print(quantiles_for_each_prediction.shape)

    # test
    one_obs = 100
    quantiles_one_obs = quantiles_for_each_prediction[:, one_obs]
    probs_validation = [round(mixture_distribution.cdf(quant_val).numpy()[one_obs], 3) for quant_val in quantiles_one_obs]
    print(f'''
        probs  input: {quantiles_to_compute}
        probs approx: {probs_validation}
    ''')

Result:

        probs  input: [0.005 0.1   0.25  0.5   0.75  0.9   0.995]
        probs approx: [0.005, 0.1, 0.251, 0.501, 0.751, 0.9, 0.995]

good enough!


Performance

Setup: 16 Xenon cores, lot's of RAM, no GPU. This simulation eats roughly 10GB (peak).

%%timeit
  ...: quantiles_for_each_prediction = mixture_distribution_quantiles(
  ...:     mixture_distribution=mixture_distribution,
  ...:     probs=quantiles_to_compute,
  ...:     N_grid_points=N_grid_points,
  ...: )
  ...: 
11.7 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Demetrio92 avatar Jun 26 '20 13:06 Demetrio92

@brianwa84 please re-open the issue!

Demetrio92 avatar Jul 20 '20 12:07 Demetrio92

Reopening on request.

axch avatar Jul 20 '20 16:07 axch

In the meantime I hacked together a rather efficient quantile approximation. USE WITH CAUTION! READ THE CODE

import numpy as np

def mixture_distribution_quantiles(mixture_distribution, probs, N_grid_points=int(1e3)):
    # base_grid = get_base_grid(mixture_distribution)
    # TODO: need a smarter grid (or you can customize it to your dataset)
    base_grid = np.linspace(-5, 20, num=N_grid_points)  # hacked in order to fit the simulation below
    full_grid = np.transpose(np.tile(base_grid, (mixture_distribution.batch_shape[0], 1)))
    cdf_grid = mixture_distribution.cdf(full_grid).numpy()  # this is fully parallelized and even uses GPU
    grid_check = (cdf_grid.min(axis=0).max() <= min(probs)) & (max(probs) <= cdf_grid.max(axis=0).min())
    if not grid_check:
        raise RuntimeError('Grid does not span full CDF range needed for interpolation!')

    probs_row_grid = np.transpose(np.tile(probs, (cdf_grid.shape[0], 1)))
    def get_quantiles_for_one_observation(cdf_grid_one_obs):
        return base_grid[np.argmax(np.greater(cdf_grid_one_obs, probs_row_grid), axis=1)]

    # TODO: this is the main performance bottleneck. uses only one CPU core
    quantiles_grid = np.apply_along_axis(
        func1d=get_quantiles_for_one_observation,
        axis=0,
        arr=cdf_grid,
    )
    return quantiles_grid


if __name__ == "__main__":
    from tensorflow_probability import distributions as tfd
    quantiles_to_compute = np.array([0.005, 0.1, 0.25, 0.5, 0.75, 0.9, 0.995])
    N_predictions = int(4*1e3)
    N_components = 5
    N_grid_points = int(1e4)
    pi_sim = np.random.uniform(size=(N_predictions, N_components)).astype(np.float32)
    pi_sim = pi_sim/np.transpose(np.tile(pi_sim.sum(axis=1), (N_components, 1)))
    mean_sim = np.random.uniform(low=5, high=10, size=(N_predictions, N_components)).astype(np.float32)
    sd_sim = np.random.uniform(low=0.1, high=1, size=(N_predictions, N_components)).astype(np.float32)
    mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Normal(
            loc=mean_sim,
            scale=sd_sim,
            validate_args=True,
            allow_nan_stats=False,
        )
    )

    quantiles_for_each_prediction = mixture_distribution_quantiles(
        mixture_distribution=mixture_distribution,
        probs=quantiles_to_compute,
        N_grid_points=N_grid_points,
    )
    print(quantiles_for_each_prediction.shape)

    # test
    one_obs = 100
    quantiles_one_obs = quantiles_for_each_prediction[:, one_obs]
    probs_validation = [round(mixture_distribution.cdf(quant_val).numpy()[one_obs], 3) for quant_val in quantiles_one_obs]
    print(f'''
        probs  input: {quantiles_to_compute}
        probs approx: {probs_validation}
    ''')

Result:

        probs  input: [0.005 0.1   0.25  0.5   0.75  0.9   0.995]
        probs approx: [0.005, 0.1, 0.251, 0.501, 0.751, 0.9, 0.995]

good enough!

Performance

Setup: 16 Xenon cores, lot's of RAM, no GPU. This simulation eats roughly 10GB (peak).

%%timeit
  ...: quantiles_for_each_prediction = mixture_distribution_quantiles(
  ...:     mixture_distribution=mixture_distribution,
  ...:     probs=quantiles_to_compute,
  ...:     N_grid_points=N_grid_points,
  ...: )
  ...: 
11.7 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I am using it to reverse mixture logistic

mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Logistic(
            loc=mean_sim,
            scale=sd_sim,
        )
    )

it seems that observations in quantiles_for_each_prediction are very different. The average mean of the error ( probs_validation - quantiles_to_compute )is bigger than 1e-5. Any suggestions to optimize?

gitlabspy avatar Jul 27 '20 04:07 gitlabspy

I am using it to reverse mixture logistic

mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Logistic(
            loc=mean_sim,
            scale=sd_sim,
        )
    )

it seems that observations in quantiles_for_each_prediction are very different. The average mean of the error ( probs_validation - quantiles_to_compute )is bigger than 1e-5. Any suggestions to optimize?

Increase N_grid_points until sufficient precision reached. If you run out of memory, you can just batch the grid.

If performance is not an issue, use pynverse as in my other example above.

Also: logistic distribution is similar to normal, so the example above works, but generally you want to have an appropriate grid for the linear interpolation. E.g. it should cover the full domain of your mixture, but at the same time not be too broad, so you have enough points. pynverse solves this problem in a very general fashion, however it's not vectorised, so you will quickly end up having scalability issues.

Demetrio92 avatar Jul 27 '20 10:07 Demetrio92

Are there any plans to implement an approximation to the quantile function of MixtureSameFamily, perhaps using information from the components_distribution for efficiency?

beat-247 avatar Nov 06 '21 11:11 beat-247

any updates on this? Was surprised to find out that even Student does not have quantile in tfp, while scipy does

Strateus avatar Jun 21 '23 20:06 Strateus

PRs are welcome, but we're not pursuing a quantile function right now for the Student T. We could use a solver along with the CDF function as was discussed, but that might have surprising runtime characteristics.

ColCarroll avatar Jun 22 '23 15:06 ColCarroll

Any plans to implement quantile?

rmmaf avatar Dec 29 '23 02:12 rmmaf