drl_grasping
drl_grasping copied to clipboard
Mesh/texture (material) memory leak
Removing a model does not free all of its memory, there is a memory leak related to meshes/textures (material). This introduces limitation to domain randomisation which causes training to eventually fail. Current workaround is to limit the frequency of adding/removing models in order to post-pone eventual crash due to no available memory.
- Encountered using ogre2
- Meshes with textures
- Noticed with obj+mtl
- Primitives with textures
- Noticed with metallic PBR pipeline
- Meshes with textures
- Related upstream issue:
- https://github.com/ignitionrobotics/ign-rendering/issues/39
- [x] Investigate further (
ign-rendering
) - [ ] Try to find a solution (if time allows it)
~The memory leak is caused by GUI, therefore it does not negatively influence the headless training process.~
GUI bug reported in https://github.com/ignitionrobotics/ign-gui/issues/208.
Nevermind, my last headless training run out of memory, so the full-scale memory leak occurs also for camera sensors. I updated the upstream issue.
I was not able to solve this issue on my own with the time I dedicated for it. Might look at it later (but not before hand-in).
Workaround: Limited object count to 80 for training and 20 for testing in order to make it feasible and reduced replay buffer size to fit inside RAM. All ~1000 objects were used previously.
One noticeable disadvantage that persists is that application might run out of VRAM and be unable to initialise feature extractor, actor and critics (nn) if objects were spawned in the environment before initialising algo. Therefore, leaning_starts
hyperaparameter must be set to something very small (preferably 0). This also means that optimization with Optuna might prune trials just because it run out of memory (since environment is not destroyed between trials) - occurs only at the beginning of trial though, but nothing I can do about it.
It seems that fortress
requires much less memory to perform the same training, when compared to dome
.
- [ ] Investigate further, and see if "unlimited" number of models can be used during training.