keras-video-generators
keras-video-generators copied to clipboard
Memory leak
Hi, my memory usage keeps increasing when I'm training. I suspect it can be frame_cache, but I do set it to false when initiating the VideoFrameGenerator. After x mini-batches it crashes with no error, but in the terminal it prints "Killed". I'm not able to run even a single epoch before it crashes, because the memory explodes. Any ideas?
Hi,
Thanks for you report.
I never had that problem myself. We are using the generator in several projects and that never happends.
Can you please give more details to let me try ?
- your operating system (Window, Linux distribution and version...)
- Python version (
python -version) - the error output with trace
- Version of the package (e.g. 1.0.13)
- how many files and labels you've got (give the output of the generator when you create it)
Are you using "model.fit_generator" or do you use a custom loop ?
Regards,
Operating system Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic
Python version Python 3.6.9
Error output with trace
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.6/trace.py", line 735, in
Package version keras-video-generators (1.0.13)
Files and labels I am using the NTU RGB-D dataset which is quite big. It leaves me with: Total data: 60 classes for 32280 files for train Total data: 60 classes for 8040 files for validation
I am using the model.fit_generator as follows:
model.fit_generator( train, validation_data=valid, verbose=1, epochs=EPOCHS, callbacks=callbacks, )
This is how htop looks just before it crashes. It shows that the memory and swap usage are growing until it cannot allocate any more memory.

OK, 32280 * 25 frames = 807 000 images in memory. If you use frame-cache, you will fill the memory with 807 000 * 224 *224 = 40 492 032 000 of pixels, that means a HUGE size :) . For that dataset, you will need to have a very high free memory size.
If I'm not wrong (it's possible that my analysis is wrong), that should break at the first epoch. If not, so there is a real memory leak.
Note that the memory grows during the first epoch. That's normal.
That's why I suspected it had something to do with the frame_cache, but as I stated, I set the flag to False and the problem still persists.
I've been looking through issues with Keras itself and people seem to have simalar issues with the fit_generator function, so this might not be related to the VideoFrameGenerator at all.
Maybe the generator has got a leak problem also. I will keep that issue open and make some tests with fake dataset. This is a problem but an interesting behavior to study.
Same issue here in debian buster with python 3.8. It crashes before finishing the first epoch.
I was able to solve it by changing the following lines in generator.py:
# add to cache self.__frame_cache[video] = frames
to
# add to cache if self.use_frame_cache: self.__frame_cache[video] = frames
Turns out it was a memory leak. I checked for the places where the use_frame_cache was being used and I couldn't find it, so I just inserted a check to the line that adds content to cache and it worked well for me.
I've trained my network for 26 epochs now and memory use is stable!
@egrassl if you can propose a pull-request, that would be great
I will make a pull-request when I get home today. Thank you!
if i have use_frame_cache set to false, should I expect to be using a lot of Ram? My dataset consists of around 12000 clips that are 30 frames long with dimensions (192,108,3). Currently, I can't finish one epoch without exceeding my 25gb ram limit. I'm not sure if its just that my dataset is too large to fit into 25gb or if there's a problem with use_frame_cache
@BANANAPEEL202 please look at the #31 pull request.
The use_frame_cache parameter wasn't being checked, but it is already fixed. Therefore, it should work properly if you update your code.
https://fantashit.com/linearly-increasing-memory-with-use-multiprocessing-and-keras-sequence/#comment-254237
@egrassl thanks, i had the same issue and that solved it