castle-engine
castle-engine copied to clipboard
High memory usage when loading a large number of animated sprites
As noted in Discord...
Loading 4 Texture atlases, each image = 3840 x 4000 (25x24 frames @ 160x160) A total of 680 animations of 1 to 8 frames each (the vast majority are 3 frame animations) Total frames used = 2064, 336 = empty (unreferenced) Idle memory usage (nothing loaded) = 40M Test memory usage (680 sprites animating on screen) = 3114M Windows App Frame rate - 2.65, Render Only = 25 (Mac 2.65 / 36)
Testcase - https://github.com/peardox/MultiSprite
Some useful variables...
gScale := 0.3375; // MainGameUnit.pas - 299 - initial sprite size, the default scale is to fit all sprites on a 1920x1080 fullscreen display gLimit := 680; // MainGameUnit.pas - 302 - How many sprites to create - 680 is "all of them", lower this number to check mem usage for less sprites
I've left a few simple key-bindings in as they're useful for testing as follows
1 - Force all sprites to original size [UpArrow] - Increase size of all sprites by 10% [DownArrow] - Decrease size of all sprites by 10% [ESC] - Quit S - Screengrab (to data/screengrab.png)
Is this fixed by now?
Nope -its on a todo list My latest version of lots of things at once goes runs out of memory even better :) https://github.com/peardox/FibonacciSphere2 - got a OOM with 4k objects on a 32gb pc
I started playing with this, profiling memory using massif ( https://github.com/castle-engine/castle-engine/wiki/Profiling-Using-Valgrind ).
Did one optimization in https://github.com/castle-engine/castle-engine/commit/de92db8f047fdb535c362fc884688ea0cf082dcb , but it's small (70 MB in my tests -- on Linux x86_64).
Tested that compiling with CASTLE_SLIM_NODES
defined also helps, by 500 MB.
These are still too small gains. Will work to investigate / optimize more.
I know for 100% that the main problem is not in textures. Textures are correctly cached, only 4 are loaded (to RAM and then to GPU). 4 large textures in this testcase (3840x4000) take about 230 MB in memory and on GPU. This was confirmed by TextureMemoryProfiler
in CGE and by doing a test with textures resized to 2x2, RAM usage drops as it should, by ~230 MB.
This could be decreased significantly by using GPU texture compression ( https://castle-engine.io/creating_data_auto_generated_textures.php ). Anyway it doesn't matter for this testcase...
... because we eat too much memory even when all the 4 textures are replaced with dummy 2x2 images. So I'll continue researching with dummy 2x2 images, as they reproduce the problem too.
This is good news, I mean the culprit is in data structures we manage then, and the structures for sprite sheets should be trivial.
It should be noted that https://github.com/peardox/FibonacciSphere2 (same concept using 3d) I actually managed to use all 28Gb (the Russian cloud PC) system memory going from 2k -> 4k objects. I'll try this on new laptop as well (RTX 3060 GPU) in a mo...
The culprit is that X3D nodes and their fields, in their current implementation, just weight way too much memory.
And this testcase iterates over all animations, and for each animation clones the entire X3D graph (that contains all the animations). So you get ~n^2 memory usage, and you have models with many animations (so you indeed make n
large).
FSpriteSheet.AnimationCount 168
FSpriteSheet.AnimationCount 232
FSpriteSheet.AnimationCount 200
FSpriteSheet.AnimationCount 80
It is possible to prevent such large consumption in this particular app (remove unnecessary animations after cloning), but this is not the answer to the general problem in CGE. The testcase shows a real, simple use-case of sprite sheets (with maaaaany animations) on a map. This should be optimized better in CGE.
And it's not the first time I observed that our X3D fields memory usage is too high. I have a a plan how to optimize them a lot, though it's some work.
I'm going to do it :), it will take a few steps to really finish the optimization.
Things done now:
- more efficient "step" animation (used by sprite sheets), so that interpolation data eats 2x less memory
- simplified X3D nodes inheritance using TNodeFunctionality, removed interfaces (also makes easier Delphi compatibility)
More will follow.
Note: you have a memory leak in this application, you never free Stage
created by
Stage := TCastleScene.Create(nil);
Simply changing owner to Application
is enough.
This is not a culprit of this issue -- I mean you're not leaking memory at runtime (forgetting to free something that should be freed earlier than application end), the memory usage at runtime is still unacceptably large.
And one more important optimization to sprite sheets nodes pushed.
The rest will be optimizing the X3D fields.
Just did some tests of old vs new on Windows New code : Memory = 1950M, Load = 3.750s Old code : Memory = 2885M, Load =5.450s i.e. it's loading about 45% faster as an added benefit
Gonna try this on FibonacciSphere2 as that takes 12 mins + to load with a lot of objects (4k)
Hmm - loading 3D sees a small drop in memory usage but increasing N * 2 results in N * 4 scene.load. I imagine this is linked to the same stuff as Sprites tho...
Not a fix yet, but at least I can start employ optimizations thanks to auto-generated nodes code: https://github.com/castle-engine/castle-engine/commit/1c4319d64d4b4899d2db034e1627ac1763b95d3c