ConfigSpace icon indicating copy to clipboard operation
ConfigSpace copied to clipboard

Memory Leak

Open mlindauer opened this issue 4 years ago • 1 comments

By running the following code, the memory usage goes up like crazy:

import ConfigSpace.hyperparameters as CSH
import numpy as np
rnd = np.random.RandomState(19937)
a = CSH.UniformIntegerHyperparameter('a', lower=1, upper=2147483647, log=True)
print(a)
print(rnd)
for i in range(1,10000):
   a.get_neighbors(0.031249126501512327, rnd, number=8, std=0.05)

tested on kisexe44 with the current master branch (0.4.19).

mlindauer avatar Jul 16 '21 14:07 mlindauer

on my machine I need roughly 5-10 iterations of the for loop s.t. the program gets stuck and requires tremendous amounts of memory (several tens of GB)

mlindauer avatar Jul 16 '21 14:07 mlindauer

@mfeurer @eddiebergman Is that issue resolved? We got new reports from SMAC users that said there is a major memory leak in SMAC. If that issue was never resolved, I would guess that the configspace is the problem. This is not a minor problem and I would like to ask you to look into it urgently.

mlindauer avatar Dec 15 '22 09:12 mlindauer

Hi Marius,

I'll take a look after lunch :)

eddiebergman avatar Dec 15 '22 11:12 eddiebergman

Hi @mlindauer,

I'm actually a little lost in terms of fixing this, namely because get_neighbours is inconsistent across hyperparameters. Here's the simplest example I can make to illustrate this:

from ConfigSpace import Integer, Normal
import numpy as np
rs = np.random.RandomState()

a = Integer("a", bounds=(10, 20))  # Uniform
b = Integer("b", bounds=(10, 20), distribution=Normal(mu=15, sigma=2))   # Normal

# Using the unit cube `value` of `0.5`
a.get_neighbours(0.5, rs, transform=True) # Works as intended, uniform numbers between [10, 20]
b.get_neighbours(0.5, rs, transform=True)  # Raises error and fails

# Using true `value` 15
a.get_neighbours(15, rs, transform=True) # Hangs indefinitely
b.get_neighbors(15, rs, transform=True)  # Works as intended

Let along introducing log into here, they are inconsistent in how they should be called and they have no documentation to help with that.

I don't know what the intended behavior is (as it crashes...) and so fixing it requires knowing what it should be.

For the sake of fixing this particular problem, where we assume for a Uniform you need to know that you pass it in as a unit cube value and it's on a log scale I will give it this behavior:

a = CSH.UniformIntegerHyperparameter('a', lower=1, upper=2147483647, log=True)
values = a.get_neighbors(0.031249126501512327, rnd, number=8, std=0.05)


mu = 0.03124 * (upper - lower)
sigma = 0.05 * (upper - lower)  
values = ... # log-normal centered at `mu` with std.dev `sigma`  

@mfeurer I'm not sure what the intended behavior is, please let me know when you can.

Secondary point:

ConfigSpace needs to be looked at in detail because I can see this being easily misunderstood and unclear. It would also be good to remove as much Cython as possible because it's an active deterrent to maintenance and any language tools being useful.

eddiebergman avatar Dec 15 '22 16:12 eddiebergman

So with some more digging into SMAC and configspace, get_neighbors is essentially some private function which should never be called by a user and instead use something like get_one_exchange_neighborhood which SMAC uses. Configspace converts variables to it's internal vector representation and I guess this is where they are stored however they should be and so get_neighbors is called with valid arguments.

My guess as to why the code snippet in the original issue hangs is that there's not enough neighbors in the region of 0.05 std.dev. When on a log scale, the rejection sampling is happy to keep sampling near the mean, converting to an int, see it's already selected as a neighbor and keep rejecting forever. This matter would be amplified further if you start taking into account quantization, which further reduces the density of integers.

This is actually a little tricky to force any "niceness" onto, the current solution stores almost nothing in memory and generally works, aside from cases like these. The other solution I tried was to essentially treat the integers as discrete choices with a gaussian plopped on top. This means no issue of infinite rejection but blows up memory in this specific case as I stored all integers from 1 to 2147483647.

There is a tradeoff between the edge cases for the different solutions. I will try and come up with some memory efficient version of the "other solution" as I don't see a nice way to handle sparse density of integers and small std-deviations with rejection sampling.

eddiebergman avatar Dec 19 '22 10:12 eddiebergman

Thanks for looking deeper into it.

As some background information: In fact, the above code snippet was my try to explain a memory leak on the following configuration space: https://bitbucket.org/mlindauer/aclib2/src/master/target_algorithms/asp/clasp-3.1.4/params.pcs It was the minimal example, I was able to come up, to reproduce the memory leak.

mlindauer avatar Dec 19 '22 10:12 mlindauer

Hi @mlindauer,

The core issue is addressed in PR #282. It uncovered some other unrelated issue (large int ranges for Normal and Beta integer hyperparams) but I fix the memory leak issue with those as well but they still suffer from unreasonably long compute times (#283).

@mfeurer can you take a look once you're free and feel free to update the PR as required as I am not back fully until Monday :)

eddiebergman avatar Jan 04 '23 15:01 eddiebergman

Thank you very much!

@benjamc could you please check whether that solves our issues.

mlindauer avatar Jan 04 '23 15:01 mlindauer