sbi SNLE and SNRE fail with a 1D prior

Describe the bug I am implementing SNRE and SNLE (from implemented algorithms) on a simple exponential simulator model with noise and a prior. The algorithms work just fine for a uniform prior but give the following error: 'number of categories cannot exceed 2^24', with the exponential prior.

To Reproduce

Versions Python version: 3.9.13 SBI version: 0.23.1
Code for SNLE implementation but I get the same error for SNRE implementation as well (with Inference object: NRE)

## Function to introduce noise into the exponential:
def noisy_exponential(scale, input):
    x_noise = np.zeros((len(input)),)
    for i in range(len(input)):
        value = 1/(scale)*np.exp(-input[i]/scale) + 0.05*np.random.normal(0,0.5)
        if value < 0:
            x_noise[i] = 0
        else:
            x_noise[i] = value
    return x_noise

n = 200
t = np.linspace(0,8,n)
# Define the prior
num_dims = 1
num_sims = 100
num_rounds = 2
theta_true = torch.tensor([2.0])


# prior = BoxUniform(low=0*torch.zeros(num_dims), high=4*torch.ones(num_dims))
exp_prior = Exponential(torch.tensor([2.0]))
simulator = lambda theta: ((1/theta)*np.exp(-t/theta) + 0.05*np.random.normal(0,0.5)).float()
x_o = torch.tensor(noisy_exponential(theta_true,t)).float()

# Inference 
inference = NLE(exp_prior)
proposal = exp_prior
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    _ = inference.append_simulations(theta, x).train()
    posterior = inference.build_posterior(mcmc_method="slice_np_vectorized",
                                          mcmc_parameters={"num_chains": 2,
                                                           "thin": 5})
    proposal = posterior.set_default_x(x_o)

RuntimeError: Number of categories cannot exceed 2^24. (after the neural network successfully converges)

RuntimeError                              Traceback (most recent call last)
Input [In [43]](vscode-notebook-cell:?execution_count=43), in <cell line: 17>()
     [16](vscode-notebook-cell:?execution_count=43&line=16) proposal = exp_prior
     [17](vscode-notebook-cell:?execution_count=43&line=17) for _ in range(num_rounds):
---> [18](vscode-notebook-cell:?execution_count=43&line=18)     theta = proposal.sample((num_sims,))
     [19](vscode-notebook-cell:?execution_count=43&line=19)     x = simulator(theta)
     [20](vscode-notebook-cell:?execution_count=43&line=20)     _ = inference.append_simulations(theta, x).train()

File ~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:306, in MCMCPosterior.sample(self, sample_shape, x, method, thin, warmup_steps, num_chains, init_strategy, init_strategy_parameters, init_strategy_num_candidates, mcmc_parameters, mcmc_method, sample_with, num_workers, mp_context, show_progress_bars)
    [303](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:303) init_strategy = _maybe_use_dict_entry(init_strategy, "init_strategy", m_p)
    [304](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:304) self.potential_ = self._prepare_potential(method)  # type: ignore
--> [306](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:306) initial_params = self._get_initial_params(
    [307](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:307)     init_strategy,  # type: ignore
    [308](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:308)     num_chains,  # type: ignore
    [309](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:309)     num_workers,
    [310](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:310)     show_progress_bars,
    [311](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:311)     **init_strategy_parameters,
    [312](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:312) )
    [313](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:313) num_samples = torch.Size(sample_shape).numel()
    [315](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:315) track_gradients = method in ("hmc_pyro", "nuts_pyro", "hmc_pymc", "nuts_pymc")

File ~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:618, in MCMCPosterior._get_initial_params(self, init_strategy, num_chains, num_workers, show_progress_bars, **kwargs)
    [615](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:615)     initial_params = torch.cat(initial_params)  # type: ignore
    [616](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/inference/posteriors/mcmc_posterior.py:616) else:
...
--> [112](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/samplers/mcmc/init_strategy.py:112) idxs = torch.multinomial(probs, 1, replacement=False)
    [113](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/samplers/mcmc/init_strategy.py:113) # Return transformed sample.
    [114](https://file+.vscode-resource.vscode-cdn.net/Users/paarthdudani/Desktop/All%20folders/Python%20files/Thesis%20Code/Pilot_code/~/anaconda3/lib/python3.9/site-packages/sbi/samplers/mcmc/init_strategy.py:114) return transform(init_param_candidates[idxs, :])

RuntimeError: number of categories cannot exceed 2^24

Expected behavior I expect the network to undergo multiple rounds of training (2 in this example) or give me a pairplot after one round of training (not shown above).

Sep 24 '24 13:09 paarth-dudani

Hi there,

thanks a lot for reporting this! The following will fix it:

class WrappedExponential(Exponential):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def log_prob(*args, **kwargs):
        log_probs = Exponential.log_prob(*args, **kwargs)
        return log_probs.squeeze()


exp_prior = WrappedExponential(torch.tensor([2.0]))

The reason that the issue was happening is that Exponential.log_prob returns a tensor of shape (num_samples, 1), but any multivariate torch distribution (e.g., MultivariateNormal) returns a tensor of shape (num_samples).

I will leave this issue open though because we should deal with this in process_prior, but we currently do not.

I hope that the above fixes the issue! Let me know if you have any more questions!

All the best Michael

Sep 24 '24 14:09 michaeldeistler

More notes for future fixing on our side:

This is only an issue for 1D pytorch distributions. The issue is that, e.g., torch.distributions.Exponential and torch.distributions.Normal, or torch.distributions.Uniform all return 1) either no sample dimension and no log-prob dimension, or 2) either a sample dimension and a log prob dimension. However, for sbi, we need a sample dimension but no log-prob dimension.

from torch.distributions import Exponential, MultivariateNormal

prior = Exponential(torch.tensor(2.0))
samples = prior.sample((10,))  # (10,)
log_probs = prior.log_prob(samples)  # (10,)
# `sbi` raises an error because one must have a sample dimension.

prior = Exponential(torch.tensor([2.0]))
samples = prior.sample((10,))  # (10, 1)
log_probs = prior.log_prob(samples)  # (10, 1)
# `sbi` fails because the log_prob dimension contains the data dim.

prior = MultivariateNormal(torch.tensor([2.0]), torch.tensor([2.0]))
samples = prior.sample((10,))  # (10, 1)
log_probs = prior.log_prob(samples)  # (10,)
# `sbi` works.

IMO, the easiest fix would be to introduce the following:

class OneDimDistributionWrapper(torch.Distribution):
    def __init__(self, dist, *args, **kwargs):
        super().__init__()
        self.dist = dist

    def sample(*args, **kwargs):
        return self.dist.sample(*args, **kwargs)

    def log_prob(*args, **kwargs):
        return self.dist.log_prob(*args, **kwargs)[..., 0]  # Remove the additional dimension.

    @property
    def arg_constraints(self) -> Dict[str, constraints.Constraint]:
        return self.dist.arg_constraints

    @property
    def support(self):
        return self.dist.support

    @property
    def mean(self) -> Tensor:
        return self.dist.mean

    @property
    def variance(self) -> Tensor:
        return self.dist.variance

We could then use this to wrap the 1D distributions which have a sample dimension.

Sep 24 '24 14:09 michaeldeistler

Hi Michael!

I am pleasantly surprised by your prompt response!

I did make the first change as you had suggested and it did resolve the issue! However, I am now facing newer issues such as the network not converging/not converging randomly.

Thanks for the help!!

Result for SNLE:

Result for SNRE (around 1/3rd of the time)

Sep 25 '24 15:09 paarth-dudani

I don't understand what is being shown in the plot.

Sep 25 '24 15:09 michaeldeistler

The plot is for the posterior on the scale parameter for the exponential being learned by SNLE/SNRE as shown in the code below for SNLE, against the true value of theta.

class WrappedExponential(Exponential):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def log_prob(*args, **kwargs):
        log_probs = Exponential.log_prob(*args, **kwargs)
        return log_probs.squeeze()


exp_prior_mod = WrappedExponential(torch.tensor([2.0]))
simulator = lambda theta: ((1/theta)*np.exp(-t/theta) + 0.05*np.random.normal(0,0.5)).float()

x_o = torch.tensor(noisy_exponential(theta_true,t)).float()

# Inference with the relevant SNE method: Here, SNLE

inference = NLE(exp_prior_mod)
proposal = exp_prior_mod
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    _ = inference.append_simulations(theta, x).train(validation_fraction=0.1, retrain_from_scratch = True)
    posterior = inference.build_posterior(mcmc_method="slice_np_vectorized",
                                          mcmc_parameters={"num_chains": 5,
                                                           "thin": 5})
    proposal = posterior.set_default_x(x_o)

# Sample from the posterior and plot

posterior_samples_exp = posterior.sample((200,), x=x_o)

_ = pairplot(
    posterior_samples_exp, limits=[[0, 10]], figsize=(5, 5),
    labels=[r"$\theta$"],
    points=theta_true # add ground truth thetas
)

An example of a desired result is comes from SNPE_C implementation as shown below:

Sep 25 '24 15:09 paarth-dudani

I looked into this, and found three issues with your model:

Your simulator has a bug. It adds 0.05*np.random.normal(0,0.5) for every theta. In other words, it adds the same noise for every theta, which is not what you want. I changed the line to

simulator = lambda theta: ((1/theta)*torch.exp(-t/theta) + 0.5 * torch.randn_like(theta)).float()

x[:, 0] can have very large values, sometimes up to thousands. These kind of extreme outliers will destroy neural network training. I fixed this with:

theta = proposal.sample((num_sims,))
x = simulator(theta)
x[x > 5.0] = 5.0

With n=200, the posterior is very very narrow. I could imagine that MCMC methods used in SNLE and SNRE can struggle with this. I used n=10.

With all of these fixes, I get the results below: debug

As you can see, even with n=10, the posterior is very narrow.

Hope this helps Michael

Sep 25 '24 16:09 michaeldeistler

Thank you so much for your suggestions!

I did try them out. However, the estimates that I am getting from each of the 3 algorithms (SNPE_C, SNLE, SNRE_B) are somewhat unreliable. I get bad posterior estimates in between and the good estimates are not reproducible.

Below are some examples of pathological posterior estimates:

SNPE

SNPE_1 SNPE2 SNPE3

SNLE

SNLE_1 SNLE2 SNLE3

SNRE

SNRE_1 SNRE3 SNRE2

It would be great if you could give some general suggestions, as my ultimate aim is to implement these algorithms for a 21 parameter ODE model.

Sep 27 '24 17:09 paarth-dudani

I am sorry for not attaching the code for this issue, so here goes:

SNRE:

exp_prior_mod = WrappedExponential(torch.tensor([2.0]))
simulator = lambda theta: ((1/theta)*np.exp(-t/theta) + 0.05*torch.rand_like(theta)).float()
x_o = torch.tensor(noisy_exponential(theta_true,t)).float()


from sbi.inference import NRE
# torch.manual_seed(0)
# np.random.seed(0)
# inference = NRE(prior)
# proposal = prior
inference = NRE(exp_prior_mod)
proposal = exp_prior_mod
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    print(np.shape(theta))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x).train()
    posterior = inference.build_posterior(mcmc_method="slice_np_vectorized",
                                          mcmc_parameters={"num_chains": 10,
                                                           "thin": 5})
    proposal = posterior.set_default_x(x_o)

SNLE: With the same prior and simulator model

inference = NLE(exp_prior_mod)
proposal = exp_prior_mod
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x).train(validation_fraction=0.1, retrain_from_scratch = False)
    posterior = inference.build_posterior(mcmc_method="slice_np_vectorized",
                                          mcmc_parameters={"num_chains": 10,
                                                           "thin": 5})
    proposal = posterior.set_default_x(x_o)

SNPE: Same prior and simulator model:

inference = NPE(prior=exp_prior_mod, density_estimator='mdn')
proposal = exp_prior_mod

# Later rounds use the APT loss.
for _ in range(num_rounds):
    theta = proposal.sample((num_sims,))
    x = simulator(theta)
    x[x>10] = 10
    _ = inference.append_simulations(theta, x, proposal=proposal).train(validation_fraction=0.1,use_combined_loss = True, retrain_from_scratch = True)
    posterior = inference.build_posterior().set_default_x(x_o)
    proposal = posterior

I had another question: What changes can I make to the code so as to deal with the overconfident underestimation of the standard deviation by the posterior estimate? 'retrain_from_scratch = True' was one of my attempts at that.

They all give pathological outputs as shown in the previous comment. I would appreciate any suggestions!

Oct 03 '24 10:10 paarth-dudani

Hi @paarth-dudani how many simulations are you using here? (I couldn't find it in the code snippet above).

Given that you seem to have a fast simulator I would start with at least 1-10k simulations to debug this case.

Above, it also seems that the x_o comes from a different simulator than the training data. For initial testing phase I would recommend using an x_o generate with the same simulator as the training data.

My recommendations would be:

use 10k simulations
use single-round NPE for now
use "nsf" (not "mdn") as density estimator
use x_o from same simulator.

Does this help?

Dec 13 '24 08:12 janfb

I am closing this as the prior wrapping issue was addressed by #1286 - @paarth-dudani, please re-open this issue if your issue is still unresolved.

Feb 26 '25 10:02 gmoss13