pymc2 copied to clipboard
Memory issue
Moved from pymc-devs/pymc3#543
Connecting a single Stochastic variable to a large number of other Stochastic variables takes a lot of memory. E.g.
def create_model(i, a):
b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
return locals()
a = pymc.Uniform('a', lower=0., upper=100., value=.1)
l = [create_model(i, a) for i in range(10000)]
model = pymc.Model(l)
while having twice as much not connected variables is fine :
def create_model(i):
a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
return locals()
l = [create_model(i) for i in range(10000)]
model = pymc.Model(l)
Doing a little digging using the memory profiler, first for the "connected" model:
$ python -m memory_profiler
Line # Mem usage Increment Line Contents
10 179.062 MiB 0.000 MiB l = [create_model(i, a) for i in range(1000)]
Line # Mem usage Increment Line Contents
3 82.871 MiB 0.000 MiB @profile
4 def main():
5 82.871 MiB 0.000 MiB def create_model(i, a):
6 b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
7 return locals()
9 82.949 MiB 0.078 MiB a = pymc.Uniform('a', lower=0., upper=100., value=.1)
10 179.062 MiB 96.113 MiB l = [create_model(i, a) for i in range(1000)]
11 247.961 MiB 68.898 MiB model = pymc.Model(l)
Line # Mem usage Increment Line Contents
5 178.930 MiB 0.000 MiB def create_model(i, a):
6 179.062 MiB 0.133 MiB b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
7 179.062 MiB 0.000 MiB return locals()
and for the "unconnected"
$ python -m memory_profiler
Line # Mem usage Increment Line Contents
3 82.832 MiB 0.000 MiB @profile
4 def main():
5 82.832 MiB 0.000 MiB def create_model(i):
6 a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
7 b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
8 return locals()
10 108.156 MiB 25.324 MiB l = [create_model(i) for i in range(1000)]
11 115.336 MiB 7.180 MiB model = pymc.Model(l)
Line # Mem usage Increment Line Contents
10 108.156 MiB 0.000 MiB l = [create_model(i) for i in range(1000)]
Line # Mem usage Increment Line Contents
5 108.129 MiB 0.000 MiB def create_model(i):
6 108.141 MiB 0.012 MiB a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
7 108.156 MiB 0.016 MiB b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
8 108.156 MiB 0.000 MiB return locals()
I have also confirmed that the connected model is not somehow creating new PyMC objects (at least as far as I can tell), and that the size of the individual variables in each model is identical, via sys.getsizeof(model.variables.pop())
So, this is still a mystery. Need to do deeper profiling, I suppose.
Any information on the memory issue?
The following relatively simple (stochastic block) model has tremendous memory usage even for small number of samples (e.g. n=100
Or am I doing something wrong in the model definition?
import numpy as np
import pymc as pm
# generate random block matrix
n = 50
A11 = np.random.rand(n, n) > 0.3
A12 = np.random.rand(n, n) > 0.9
A21 = np.random.rand(n, n) > 0.9
A22 = np.random.rand(n, n) > 0.3
A_obs = np.bmat([[A11, A12], [A21, A22]])
N = A_obs.shape[0]
K = 2
# define model
pi = pm.Dirichlet('pi', theta=0.5 * np.ones(K))
eta = pm.Container([[pm.Beta('b_{}{}'.format(i, j), alpha=1, beta=1) for i in range(K)] for j in range(K)])
q = pm.Container([pm.Categorical('q_{}'.format(i), p=pi) for i in range(N)])
A = pm.Container([[pm.Bernoulli('A_{}_{}'.format(i, j),
p=pm.Lambda('A_lambda_{}_{}'.format(i, j),
lambda qi=q[i], qj=q[j], eta=eta: eta[qi][qj]),
value=A_obs[i, j], observed=True) for i in range(N)] for j in range(N)])
# sample
mcmc = pm.MCMC([A, q, pi, eta])
trace = mcmc.sample(200)