h2o4gpu
h2o4gpu copied to clipboard
CPU memory maxed out even when computing with GPU
- OS platform, distribution and version (e.g. Linux Ubuntu 16.04): Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-62-generic x86_64)
- Installed from (source or binary): Conda
- Version: 0.3.0.9999
- Python version (optional): 3.6.3
- CUDA/cuDNN version: CUDA 9.2
- GPU model (optional): NVIDIA Titan V
- CPU model: 12 CPU cores of Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
- RAM available: 64GB
Further to issue #175 , despite adding cuda.close() to release GPU memory. Slowly over a couple hundred of iterations of h2o4gpu.LogisticRegression(), I find my CPU memory filling up to maximum - but the code did not crash nor print out any error, just froze. Despite trying various tricks with gc.collect() or del, I still cannot release the occupied memory.
I am trying to do a genetic algorithm on top of cross-validating with LR, so it is crucial that I am able to run a lot of iterations until the GA converges.
Please see below for fragments of codes:
from h2o4gpu import LogisticRegression
from numba import cuda
import numpy as np
import timeit
def create_population(n, n1):
tmp = [np.random.choice(62, 16, replace=False) for i in range(n)]
return np.array([[int(j in t) for j in range(62)] for t in tmp])
def fitness(chromosome):
channels = np.where(chromosome)[0]
X_new = [X[:,k*nf:(k+1)*nf] for k in channels]
X_new = np.hstack(X_new)
max_acc = 0
for C in C_grid:
acc = []
svc = LogisticRegression(penalty='l1', verbose=0, n_gpus=1, C=C, warm_start=True)
for k in np.unique(ps):
pos = np.where(ps!=k)[0]
svc.fit(X_new[pos], Y[pos])
pos = np.where(ps==k)[0]
tmp = (svc.predict(X_new[pos])==Y[pos]).mean()
acc.append(tmp)
max_acc = max(max_acc, sum(acc)/len(acc))
cuda.close()
return max_acc
def mutation(population):
r = np.random.random(size=population.shape)
mut_probs = np.full(population.shape, 0.03)
m = r < mut_probs
for i, j in zip(*np.where(m)):
population[i][j] = 1-population[i][j]
return population
def crossover(chrom1, chrom2):
idx = np.random.randint(1, 61)
child1 = np.hstack([chrom1[:idx],chrom2[idx:]])
child2 = np.hstack([chrom2[:idx],chrom1[idx:]])
return np.vstack([child1, child2])
npop = 20
population = create_population(npop, 16)
f = [fitness(chrom) for chrom in population]
for step in range(100):
rank = np.argsort(f)[::-1]
p = f/sum(f)
c = [col[i] for i in np.where(population[rank[0]])[0]]
print(step, len(c), c, max(f))
parents = np.random.permutation(rank[:npop//2])
new_pop = [crossover(population[parents[2*i]], population[parents[2*i+1]]) for i in range(npop//4)]
new_pop = np.vstack(new_pop)
new_pop = mutation(new_pop)
population = np.vstack([population[rank[:npop//2]], new_pop])
new_f = [fitness(chrom) for chrom in new_pop]
f = [f[i] for i in rank[:npop//2]] + new_f
I am sorry that I am not able to attach the entire code for confidential reasons.