numba icon indicating copy to clipboard operation
numba copied to clipboard

Support p= option in numpy.random.choice

Open hameerabbasi opened this issue 8 years ago • 16 comments

In numpy.random.choice, the p= option is not supported. Please add support for this option.

For reference, the p= option specifies the probabilities for the different elements in the first argument.

hameerabbasi avatar Sep 10 '17 09:09 hameerabbasi

I would like to reiterate this. Being able to specify probabilities is essential for sampling from general distributions with a discrete outcome space.

If outcomes is the vector with possible outcomes and p the probability vector, a workaround for sampling size values with replacement is

x = outcomes[np.searchsorted(np.cumsum(p), np.random.rand(size))]

gvanderheide avatar May 07 '19 13:05 gvanderheide

Similarly to @gvanderheide I've used successfully a workaround based on np.random.choice code published on https://github.com/numpy/numpy/blob/76a76c78d1b049126153e81b0a9d137fa3e4947b/numpy/random/mtrand/mtrand.pyx#L1194:

Given the probability distribution probability and the sampling size size, the code is

cumulative_distribution = np.cumsum(probability) cumulative_distribution /= cumulative_distribution[-1] uniform_samples = np.random.rand(size) index = np.searchsorted(cumulative_distribution, uniform_samples, side="right")

Tested with a sample size of 1 and probability = np.asarray([0.60, 0.16, 0.12, 0.08, 0.04]) The result over 100000000 generated indices is: [0.60007007 0.1599901 0.11995732 0.08000664 0.03997587]

Close enough I guess..

cedricsimar avatar Jun 02 '19 11:06 cedricsimar

Echoing what @cedricsimar said above, I used the following numba-compatible workaround (works with the nb.jit(nopython=True) decorator:

def rand_choice_nb(arr, prob):
    """
    :param arr: A 1D numpy array of values to sample from.
    :param prob: A 1D numpy array of probabilities for the given samples.
    :return: A random sample from the given array with a given probability.
    """
    return arr[np.searchsorted(np.cumsum(prob), np.random.random(), side="right")]```

mikefenton avatar Jul 01 '19 15:07 mikefenton

Echoing what @cedricsimar said above, I used the following numba-compatible workaround (works with the nb.jit(nopython=True) decorator:

def rand_choice_nb(arr, prob):
    """
    :param arr: A 1D numpy array of values to sample from.
    :param prob: A 1D numpy array of probabilities for the given samples.
    :return: A random sample from the given array with a given probability.
    """
    return arr[np.searchsorted(np.cumsum(prob), np.random.random(), side="right")]```

For this solution if you have a grouping of values that don't sum to 1, simply apply a softmax to your "prob" array:

# assume prob is something like [1, 10, 5, 43, 2] (non-probabalistic)
exp = np.exp(prob)
prob = exp/np.sum(exp)

verbiiyo avatar Jul 22 '19 22:07 verbiiyo

Echoing what @cedricsimar said above, I used the following numba-compatible workaround (works with the nb.jit(nopython=True) decorator:

def rand_choice_nb(arr, prob):
    """
    :param arr: A 1D numpy array of values to sample from.
    :param prob: A 1D numpy array of probabilities for the given samples.
    :return: A random sample from the given array with a given probability.
    """
    return arr[np.searchsorted(np.cumsum(prob), np.random.random(), side="right")]```

For this solution if you have a grouping of values that don't sum to 1, simply apply a softmax to your "prob" array:

# assume prob is something like [1, 10, 5, 43, 2] (non-probabalistic)
exp = np.exp(prob)
prob = exp/np.sum(exp)

Aye, I actually normalise the probabilities before they get passed in here, but good point!

mikefenton avatar Jul 23 '19 08:07 mikefenton

Thanks guys. Was using numpy.random.choice and saw it didn't work, so I switched to random.choices which also doesn't work :) Glad to find a workaround here.

Quetzalcohuatl avatar Dec 18 '19 17:12 Quetzalcohuatl

I tried to implement the ideas above, but none of them worked for me with nb.jit(nopython=True). Are these methods still work for others?

gubazoltan avatar Dec 16 '21 14:12 gubazoltan

@thundergoth I'd suggest making a post on the Numba Discourse with an example of the code that you're using and the errors that you're getting when running it - hopefully someone will be able to help you out there.

gmarkall avatar Dec 16 '21 16:12 gmarkall

Yea, was really sad when my code didn't work... at least glad to have found a workaround!

uditrana avatar Feb 17 '23 00:02 uditrana

@kc611 Is this issue still relevant now that the generator API is supported?

gmarkall avatar Feb 17 '23 11:02 gmarkall

It's still relevant; np.random.Generator().choice() is awaiting on NumPy Advanced Indexing support (https://github.com/numba/numba/issues/8616)

The distribution implementations for Generator API are being tracked at: https://github.com/numba/numba/issues/8519

kc611 avatar Feb 17 '23 11:02 kc611

Echoing what @cedricsimar said above, I used the following numba-compatible workaround (works with the nb.jit(nopython=True) decorator:

def rand_choice_nb(arr, prob):
    """
    :param arr: A 1D numpy array of values to sample from.
    :param prob: A 1D numpy array of probabilities for the given samples.
    :return: A random sample from the given array with a given probability.
    """
    return arr[np.searchsorted(np.cumsum(prob), np.random.random(), side="right")]```

How do you define size of the array?

kpa224 avatar Jul 25 '23 10:07 kpa224

I was trying to use the p= option as well. Thank you for the workaround, but it would be much more convenient if it wasn't necessary. Would be great if the option was included in a future version!

kodingkoning avatar Nov 15 '23 23:11 kodingkoning

Thank you for these solutions!

jmshea avatar Mar 30 '24 02:03 jmshea

I'm also looking forward to native Numba support for the 'p' parameter. I wrote the below to gain a better understanding of the problem.

It appears to work well in my testing with the same performance as the solution above. Any suggested tweaks for a performance gain would be much appreciated.

@njit
def random_choice(a, p, rng):
    """
    :param a: Sample values
    :param p: Probabilities of each sample
    :param rng: A Numpy random number generator instance
    :return: A random sample of a
    """
    rand = rng.random()
    cumm_p = 0.0
    for i in range(len(a)):
        cumm_p += p[i]
        if rand <= cumm_p:
            return a[i]

johnduffymsc avatar Jul 08 '24 13:07 johnduffymsc

I am also still looking forward for native support for choice with probabilties. For using uncertainty in discrete cases this is often used.

dominikue avatar Sep 22 '25 15:09 dominikue