pymc
pymc copied to clipboard
DOC: format of the initial values for the sample_smc function
Issue with current documentation:
The PyMC documentation on the sample_smc function for Sequential Monte Carlo (SMC) doesn't describe the correct format/shape for the start parameter, which specifies the initial values for SMC. As a result, the users are required to figure out the correct format/shape on their own. Additionally, this unit test on the start parameter only tests a single chain - it doesn't consider a scenario with multiple chains.
Originally, I posted a question about the correct format/shape of the start parameter on the PyMC Discourse. @ricardoV94 then spotted that the unit test only covered the case of a single chain, suggesting me to report this issue on GitHub.
To be more concrete, let us consider the following code for Bayesian linear regression using SMC (this code is from my question posted on Discourse):
import pymc as pm
import numpy as np
def basic_model(observed_data):
array_sizes = np.array([size for (size, _) in observed_data])
array_costs = np.array([cost for (_, cost) in observed_data])
coefficient_sigma = 5
with pm.Model() as model:
coefficient0 = pm.HalfNormal(
"coefficient0", sigma=coefficient_sigma)
coefficient1 = pm.HalfNormal(
"coefficient1", sigma=coefficient_sigma)
predicted_bounds = coefficient0 + coefficient1 * array_sizes
observed_costs = pm.Normal("observed_costs", mu=predicted_bounds,
sigma=10, observed=array_costs)
return model
observed_data = [[1, 1], [2, 2], [4, 3], [8, 4], [16, 7], [32, 10], [64, 13], [128, 17], [256, 18]]
num_draws = 1000
num_chains = 4
init_smc = {"coefficient0_log__": np.full((num_draws, num_chains), 10),
"coefficient1_log__": np.full((num_draws, num_chains), 10)}
with basic_model(observed_data):
idata = pm.sample_smc(num_draws, start=init_smc,
chains=num_chains, random_seed=42)
Here, inside the dictionary init_smc for SMC's initial values, the latent variables (i.e., coefficient0_log__ and coefficient1_log__) are each mapped to a numpy array of shape (num_draws, num_chains). If I used a different shape, such as (num_chains, num_draws), the code would crash. The PyMC documentation doesn't clarify what the correct shape of the numpy array should be.
Idea or request for content:
I would be grateful if someone could update the documentation on the sample_smc function's start parameter and also add a unit test to test the start parameter in the presence of multiple chains.
]
:tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.