h2o4gpu icon indicating copy to clipboard operation
h2o4gpu copied to clipboard

CPU memory maxed out even when computing with GPU

Open socathie opened this issue 6 years ago • 0 comments

  • OS platform, distribution and version (e.g. Linux Ubuntu 16.04): Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-62-generic x86_64)
  • Installed from (source or binary): Conda
  • Version: 0.3.0.9999
  • Python version (optional): 3.6.3
  • CUDA/cuDNN version: CUDA 9.2
  • GPU model (optional): NVIDIA Titan V
  • CPU model: 12 CPU cores of Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
  • RAM available: 64GB

Further to issue #175 , despite adding cuda.close() to release GPU memory. Slowly over a couple hundred of iterations of h2o4gpu.LogisticRegression(), I find my CPU memory filling up to maximum - but the code did not crash nor print out any error, just froze. Despite trying various tricks with gc.collect() or del, I still cannot release the occupied memory.

I am trying to do a genetic algorithm on top of cross-validating with LR, so it is crucial that I am able to run a lot of iterations until the GA converges.

Please see below for fragments of codes:

from h2o4gpu import LogisticRegression
from numba import cuda
import numpy as np
import timeit
def create_population(n, n1):
    tmp = [np.random.choice(62, 16, replace=False) for i in range(n)]
    return np.array([[int(j in t) for j in range(62)] for t in tmp])

def fitness(chromosome):
    channels = np.where(chromosome)[0]
    X_new = [X[:,k*nf:(k+1)*nf] for k in channels]
    X_new = np.hstack(X_new)
    
    max_acc = 0
    for C in C_grid:
        acc = []
        svc = LogisticRegression(penalty='l1', verbose=0, n_gpus=1, C=C, warm_start=True)
        for k in np.unique(ps):
            pos = np.where(ps!=k)[0]
            svc.fit(X_new[pos], Y[pos])
            pos = np.where(ps==k)[0]
            tmp = (svc.predict(X_new[pos])==Y[pos]).mean()
            acc.append(tmp)
        max_acc = max(max_acc, sum(acc)/len(acc))
    cuda.close()
    return max_acc

def mutation(population):
    r = np.random.random(size=population.shape)
    mut_probs = np.full(population.shape, 0.03)
    m = r < mut_probs
    for i, j in zip(*np.where(m)):
        population[i][j] = 1-population[i][j]
    return population

def crossover(chrom1, chrom2):
    idx = np.random.randint(1, 61)
    child1 = np.hstack([chrom1[:idx],chrom2[idx:]])
    child2 = np.hstack([chrom2[:idx],chrom1[idx:]])
    return np.vstack([child1, child2])
npop = 20

population = create_population(npop, 16)
f = [fitness(chrom) for chrom in population]

for step in range(100):
    rank = np.argsort(f)[::-1]
    p = f/sum(f)
    c = [col[i] for i in np.where(population[rank[0]])[0]]
    print(step, len(c), c, max(f))
    parents = np.random.permutation(rank[:npop//2])
    new_pop = [crossover(population[parents[2*i]], population[parents[2*i+1]]) for i in range(npop//4)]
    new_pop = np.vstack(new_pop)
    new_pop = mutation(new_pop)
    population = np.vstack([population[rank[:npop//2]], new_pop])
    new_f = [fitness(chrom) for chrom in new_pop]
    f = [f[i] for i in rank[:npop//2]] + new_f

I am sorry that I am not able to attach the entire code for confidential reasons.

socathie avatar Dec 19 '18 15:12 socathie