keras-video-generators icon indicating copy to clipboard operation
keras-video-generators copied to clipboard

Memory leak

Open MagnusAagaard opened this issue 5 years ago • 13 comments

Hi, my memory usage keeps increasing when I'm training. I suspect it can be frame_cache, but I do set it to false when initiating the VideoFrameGenerator. After x mini-batches it crashes with no error, but in the terminal it prints "Killed". I'm not able to run even a single epoch before it crashes, because the memory explodes. Any ideas?

MagnusAagaard avatar Mar 12 '20 12:03 MagnusAagaard

Hi,

Thanks for you report.

I never had that problem myself. We are using the generator in several projects and that never happends.

Can you please give more details to let me try ?

  • your operating system (Window, Linux distribution and version...)
  • Python version (python -version)
  • the error output with trace
  • Version of the package (e.g. 1.0.13)
  • how many files and labels you've got (give the output of the generator when you create it)

Are you using "model.fit_generator" or do you use a custom loop ?

Regards,

metal3d avatar Mar 13 '20 05:03 metal3d

Operating system Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic

Python version Python 3.6.9

Error output with trace Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3.6/trace.py", line 735, in main() File "/usr/lib/python3.6/trace.py", line 723, in main t.runctx(code, globs, globs) File "/usr/lib/python3.6/trace.py", line 462, in runctx exec(cmd, globals, locals) File "train.py", line 104, in callbacks=callbacks, File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1732, in fit_generator initial_epoch=initial_epoch) File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras/engine/training_generator.py", line 185, in fit_generator generator_output = next(output_generator) File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 625, in get six.reraise(*sys.exc_info()) File "/home/jjspeciale/.local/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 610, in get inputs = future.get(timeout=30) File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 406, in get_index return _SHARED_SEQUENCES[uid][i] File "/home/jjspeciale/.local/lib/python3.6/site-packages/keras_video/generator.py", line 364, in getitem return np.array(images), np.array(labels) MemoryError: Unable to allocate 230. MiB for an array with shape (16, 25, 224, 224, 3) and data type float32

Package version keras-video-generators (1.0.13)

Files and labels I am using the NTU RGB-D dataset which is quite big. It leaves me with: Total data: 60 classes for 32280 files for train Total data: 60 classes for 8040 files for validation

I am using the model.fit_generator as follows: model.fit_generator( train, validation_data=valid, verbose=1, epochs=EPOCHS, callbacks=callbacks, )

This is how htop looks just before it crashes. It shows that the memory and swap usage are growing until it cannot allocate any more memory. htop

MagnusAagaard avatar Mar 16 '20 10:03 MagnusAagaard

OK, 32280 * 25 frames = 807 000 images in memory. If you use frame-cache, you will fill the memory with 807 000 * 224 *224 = 40 492 032 000 of pixels, that means a HUGE size :) . For that dataset, you will need to have a very high free memory size.

If I'm not wrong (it's possible that my analysis is wrong), that should break at the first epoch. If not, so there is a real memory leak.

Note that the memory grows during the first epoch. That's normal.

metal3d avatar Mar 16 '20 13:03 metal3d

That's why I suspected it had something to do with the frame_cache, but as I stated, I set the flag to False and the problem still persists.

I've been looking through issues with Keras itself and people seem to have simalar issues with the fit_generator function, so this might not be related to the VideoFrameGenerator at all.

MagnusAagaard avatar Mar 16 '20 14:03 MagnusAagaard

Maybe the generator has got a leak problem also. I will keep that issue open and make some tests with fake dataset. This is a problem but an interesting behavior to study.

metal3d avatar Mar 16 '20 15:03 metal3d

Same issue here in debian buster with python 3.8. It crashes before finishing the first epoch.

egrassl avatar Jul 18 '20 18:07 egrassl

I was able to solve it by changing the following lines in generator.py:

           # add to cache
           self.__frame_cache[video] = frames

to

            # add to cache
           if self.use_frame_cache:
               self.__frame_cache[video] = frames

Turns out it was a memory leak. I checked for the places where the use_frame_cache was being used and I couldn't find it, so I just inserted a check to the line that adds content to cache and it worked well for me.

I've trained my network for 26 epochs now and memory use is stable!

egrassl avatar Jul 18 '20 23:07 egrassl

@egrassl if you can propose a pull-request, that would be great

metal3d avatar Jul 22 '20 07:07 metal3d

I will make a pull-request when I get home today. Thank you!

egrassl avatar Jul 22 '20 14:07 egrassl

if i have use_frame_cache set to false, should I expect to be using a lot of Ram? My dataset consists of around 12000 clips that are 30 frames long with dimensions (192,108,3). Currently, I can't finish one epoch without exceeding my 25gb ram limit. I'm not sure if its just that my dataset is too large to fit into 25gb or if there's a problem with use_frame_cache

BANANAPEEL202 avatar Jul 31 '20 22:07 BANANAPEEL202

@BANANAPEEL202 please look at the #31 pull request.

The use_frame_cache parameter wasn't being checked, but it is already fixed. Therefore, it should work properly if you update your code.

egrassl avatar Aug 03 '20 20:08 egrassl

https://fantashit.com/linearly-increasing-memory-with-use-multiprocessing-and-keras-sequence/#comment-254237

atotev avatar Mar 20 '21 19:03 atotev

@egrassl thanks, i had the same issue and that solved it

LeoAraujoEE avatar Mar 23 '21 16:03 LeoAraujoEE