probability
probability copied to clipboard
Memory leak when drawing many tfd.GaussianProcess samples?
When instantiating many 2-dimensional GPs with exponentiated quadratic kernels and sampling over several thousand points, I'm getting memory errors: ResourceExhaustedError: failed to allocate memory [Op:Mul]
I am able to produce a minimal working example in Google Colab: https://colab.research.google.com/drive/1yOzrWbyyia3zLirXQ6Z76scf20iryP4W#scrollTo=wNzD5JdxqgOX
Just ensure you have Runtime -> Change runtime type -> Hardware accelerator = GPU so that GPUs are used.
I provide the code here as well, for convenience:
import numpy as np
import tensorflow_probability as tfp
tfd = tfp.distributions
tfk = tfp.math.psd_kernels
from tqdm import tqdm
# This raises ResourceExhaustedError after 626 iterations
for i in tqdm(range(1000)):
foo = tfd.GaussianProcess(
kernel=tfk.ExponentiatedQuadratic(np.float64(1.), np.float64(1.)),
index_points=np.random.randn(6_500, 2).astype(np.float64),
observation_noise_variance=.05**2,
).sample(seed=i).numpy()
I thought I could avoid this error by reusing the same GaussianProcess
object when drawing samples, but this also ended up raising a ResourceExhaustedError
. I guess this suggests the issue is from running sample()
many times, rather than instantiating the GP objects. Does this hint at a memory leak occurring?
Hi, a couple of comments here:
Is the intent to draw a 1000 samples from a GP parameterized by a ExponentiatedQuadratic(1., 1.)
?
If so I would create the GP object outside the loop. The GP object gets recreated each time in the loop, which means that the covariance matrix has to be recomputed each time which adds to memory costs. Making one GP outside the loop and calling gp.sample(seed=i)
should suffice.
Note that you can also eliminate the loop if you just do gp.sample(1000, seed=23)
.
You will get back a Tensor
of shape [1000, 6500]
. Indexing in to the first dimension will give you independent samples back. This should drastically reduce memory and also make things much faster since the samples will be generated in a vectorized fashion.
Hi @srvasude, thanks very much for the response. You might have missed in my OP I said I also had the error when reusing the same GaussianProcess
object and only drawing samples in the loop.
However, using the method of passing 1000
to sample
did the trick for me! I've updated the Google Colab MWE to demonstrate your solution. I have also kept the code block with the loop that triggers the ResourceExhausedError
.
I'll leave this issue open because I still think it would be worth identifying the growing memory cost of running sample
multiple times. But if a TensorFlow team member disagrees, feel free to close.
@srvasude
"The GP object gets recreated each time in the loop, which means that the covariance matrix has to be recomputed each time which adds to memory costs. Making one GP outside the loop and calling gp.sample(seed=i) should suffice."
Just curious, why can't the previously computed GP objects be re-claimed by the garbage collectors, since it is now out of scope (was overwritten by the current "foo" instance)? Could this be improved by updating the class destructor somehow? Thank you.