pymc
pymc copied to clipboard
Draft external sampler API
Discussion needed !!
Draft PR motivated by https://github.com/pymc-devs/pymc/discussions/7699
The need being
Basically my question is: would PyMC be open to a PR along these lines? For public facing visibility, we'd like to put the algorithm into PyMC rather than use PyMC indirectly (i.e. extract a density from a probabilistic program in PyMC), and I thought this might be the best way.
The first idea discussed (and implemented here) was to allow pm.sample(step=ExternalSampler()) that defers sampling to an external library (including the ones we already supported before like nutpie, numpyro, ...)
import pymc as pm
with pm.Model() as m:
x = pm.Normal("x")
idata = pm.sample(nuts_sampler="nutpie", nuts_sampler_kwargs=kwargs) # <- Before
idata = pm.sample(step=pm.external.Nutpie(**kwargs)) # <- Now
idata = pm.sample(step=pm.external.MCLMC(**kwargs)) # <- Future non NUTS methods can also be used
Pros: We are not assuming everything is a nuts_sampler, and there are objects / functions users can read to find the arguments that parametrize the samplers.
Cons: We're trying to put everything through pm.sample so it can reuse 6-8 arguments (tune, draws, chains, idata_kwargs ...) that already existed in pm.sample? Also tune/draws/chains may not make sense for some external samplers.
What I think is useful is to provide a standard API point to find external samplers (with our logic to connect pymc-library), which this PR kind of offers in pm.external.
The new ExternalSampler object is a bit awkward. It doesn't do much other other than allowing pm.sample to recognize it, so you can pass pm.sample(step=pm.external.Nutpie()), but then it has to arbitrarily split arguments between instantiation and sample, so as to make the sampler specific arguments discoverable, while reusing the few sampler-agnostic arguments.
Wouldn't it make more sense to just offer pm.external.sample_nutpie()?
IIRC this is why we went with pm.sample_smc instead of pm.sample(step=pm.SMC()) which used to exist before. It became awkward to have a function for both approaches.
📚 Documentation preview 📚: https://pymc--7880.org.readthedocs.build/en/7880/