WorldModelsExperiments icon indicating copy to clipboard operation
WorldModelsExperiments copied to clipboard

MemoryError in vae_train.py

Open Chazzz opened this issue 6 years ago • 5 comments

Running python vae_train.py prompts a memory error on my system. I felt bad about this, but after running the numbers, vae_train.py needs to allocate ~125 GB of memory to this array!

>>> import numpy as np
>>> M = 1000
>>> N = 10000
>>> data = np.zeros((M*N, 64, 64, 3), dtype=np.uint8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Chazzz avatar Jan 21 '19 20:01 Chazzz

Hmm, this looks like #19. I am trying the solution suggested there. Thanks for crunching the numbers, I had a measly 16gigs when it happened to me.

xiaoschannel avatar Jan 21 '19 22:01 xiaoschannel

@zuoanqh Not tremendously surprising that memory limitations are present in both experiments. A more dynamic loading would probably fix both issues.

Chazzz avatar Jan 22 '19 00:01 Chazzz

@zuoanqh Not sure how far you got on this, but I have memory-free loading (not including training) at 1.25 hours (8 mins per epoch) in my fork's atari/vae_train.py. I convert the episodes into uncompressed (10x), individual images (100x), which are then loaded in parallel (10x) before being fed into tensorflow. Also being in black and white (atari only) is another 3x performance improvement which doesn't convert to doom and carracing. The only faster alternative I can think of is to convert to BMP and get tensorflow to manage the entire batching process using parallel prefetching.

Note that 10M uncompressed frames is about 80GB for single channel and 240GB for tri-channel images and takes several hours. VAE training (not including loading) takes about 5 hours on my system.

Chazzz avatar Jan 28 '19 08:01 Chazzz

@Chazzz my experiment requires transitions rather than frames, so that's taking a bit more time to upgrade without doubling disk/memory usage -- i got it to work with about 1k episodes though...

xiaoschannel avatar Jan 28 '19 08:01 xiaoschannel

@zuoanqh Yikes that's a lot of channels, then again you don't really need 10k episodes unless you're creating a baseline.

Chazzz avatar Jan 28 '19 17:01 Chazzz