Facing an OOM using tiled rendering with Qt
I am facing an OOM using tiled rendering with Qt. Finally the app is killed by the crazy lowmemorykiller when used memory is up to 1.1~1.2 GB. So I stuck to use the commit a15448b5422d0f81cccf8947bdca127cfb454296, where the issue doesn't exist. No more investigation yet, but using plan rendering the issue is not.
@Karry , I guess it could be a cache issue as no longer control. Running with valgrind doesn't help me.
It seems the memory footprint of the app is bigger after this commit. Offline Tiled cache is freed as I can see by switching the renderer mode to plan. For the story, from months I am in debugging phase facing with many issues and bugs with the latest Android sdk that should support C++20. It is a misery ;-).
It seems the memory footprint of the app is bigger after this commit.
It is kind of expected. Just keep in mind that HiDPI support was rewritten later: https://github.com/Framstag/libosmscout/pull/1622 , but in practice effect is the same - rendered tiles are twice as big (1.5^2) to get smooth result. So, visual cache occupies twice of memory. By default, it should be just around less than 100 MiB for each MapWidget component (on regular mobile screen), but all osmscout caches may be huge... For that reason, application may release them explicitly, when system memory is under pressure....
And it leads me to the question - do Android have some API to notify user applications that memory is under the pressure and applications should release its caches?
do Android have some API to notify user applications that memory is under the pressure and applications should release its caches?
Yes it has (API Memory Advice). But how I could control the flush or hint a purge of cached data ?
Issue occurs using one widget, the main screen. I tried reducing the offline cache size to 50MB, but to no avail. The memory allocation is elsewhere.
I could step commit by commit to try to identify the one, where allocation grow. But more than one major change can impact the behavior: c++20 features , map features etc. Currently version 1.13.5 works well (a15448b5422d0f81cccf8947bdca127cfb454296) with patch for standard C++17. Testing 5b0456373318db3b310c39f54a0db6afaf5af08f -> OOM Testing 07e27ba8d45bd6f0043b374e9abb5e6f633708ea -> OOM
On Android the "lowmemorykiller" is very aggressive !
In Sailfish app, I created service that check memory usage periodically (because api for memory notifications is broken) and when memory is under pressure, call DBThread::FlushCaches. From DBThread is the signal propagated to all places that may release the memory...
Beside that, I call malloc_trim to return memory from C++ allocator back to the kernel... It may impact performance, but it is handy, when you really need to release memory back to the system.
Thanks @Karry , you put me back on the right solution. Using Android memory adviser to trig a flush could close the issue on Android, but not others. Your choice fixes the issue on all platforms. This morning, after a good coffee, I launched a routing simulation with two apps in parallel made with commits a15448b5422d0f81cccf8947bdca127cfb454296 and 5b0456373318db3b310c39f54a0db6afaf5af08f. The result is clear:
I take 120MB of additional memory allocation with the latest commit, which explains the OOM. Now still to implement your solution to flush ...
Continuing investigations, I isolated the PR responsive for the growing of allocation: PR #1622. It seems some "back buffers", cannot be managed, are involved and increase quickly the used memory. By plugging a memory manager which try to flush on OS events (Android Memory Adviser), doesn't fix the issue because allocation increases too fast.
As example, running the app for 10 minutes on a distance of 5 km, it will be killed because allocations increases fast to exceed 1GB.
So I reverted this PR, and issue doesn't longer exist: memory allocation remain stable around 350MB. During my last trip, I traveled 300 km without any problems. (see https://github.com/janbar/libosmscout/tree/ffd19d55e13b33ce2a5d4a36624846e78770d504).
I noticed too using QImage instead QPixmap, the cpu usage increases to 5%. So I reverted all commits of the PR.
I noticed too using QImage instead QPixmap, the cpu usage increases to 5%. So I reverted all commits of the PR.
Wow... It really should not happened. When you look to Qt's QOpenGLTextureCache code (I checked old Qt 5 and the latest), there is additional conversion when using QPixmap, but when QImage with proper pixel format and size is used, it is send to graphic memory without any conversion. So, using QImage should be faster, and I was able to confirm it with profiler on Xperia 10 II and Linux desktop... Can you dig deeper please?
With memory usage, we can make the upscale configurable, the image tile size is trade-off between smooth result on HiDPI displays and memory footage...
@Karry , sorry I make mistake. Using Qimage instead QPixmap improves memory usage, and that doesn't impact the cpu usage. CPU usage increases when applying all commits of the PR #1622. This could be due to an increased number of page in/out.
I tested again including the commit "Use QImage" (https://github.com/janbar/libosmscout/commit/9c316bc8c5ea496fb618389ad1ccbc6f8219bc6d).
What are your screen dimensions (reported by Qt) and HiDPI upscale? You may also try to lower OSMScoutQt::onlineTileCacheSize and OSMScoutQt::offlineTileCacheSize during library setup, by OSMScoutQtBuilder::WithTileCacheSizes. You just need to keep cache size big enough to cover whole screen...
The tile caches settings are already configured. On device mobile the offline tile cache size is 100MB. Reducing it has no effect on the OOM.
The screen config reported by Qt is the following: 04-15 18:58:50.966 7699 7720 W osmin : ##### pixel ratio = 3.000000 04-15 18:58:50.966 7699 7720 W osmin : ##### size = 360 x 768
The commit https://github.com/janbar/libosmscout/commit/9b584878de6bc6eb93c4d740680613dd7fd4e259 works as expected without issue. From the PR #1622 there is a bad guy that pump the memory.
@Karry , I found the culprit.
Tell us :-)
The culprit is the up-scaling of tiles.
In commit 07e27ba8d45bd6f0043b374e9abb5e6f633708ea
double finalDpi = mapDpi * 2 * this->screenPixelRatio;
Fixed/Updated by next commit d721e811da67839160e23c7af34e16d642ef4c64
double finalDpi = mapDpi * this->screenPixelRatio;
...
tileDimension = qNextPowerOfTwo(tileDimension - 1);
finalDpi = (double(tileDimension) / double(OSMTile::osmTileOriginalWidth())) * OSMTile::tileDPI();
But removing it, doesn't fix the initial issue about blurred tiles. My original commit 9b584878de6bc6eb93c4d740680613dd7fd4e259 fixed it by an other way without doubling the memory usage. I guess a solution is to mix the two PR: removing up-scaling and keeping my fix.
I will make some tests trying to mix the 2 ways without up-scale the tiles.
Also I suspect a bug in the Qt backend, some times the issue doesn't occur and memory keep under control, and some times it increases continuously like a leak or something not purged.
Or a bug in caches flushing ... is a dead-lock can occur calling flushCaches() ?
The bench for 3 versions:
Rollback PR #1622: fixes the OOM 12701 jlb 20 0 4199284 357016 144764 S 2,0 1,1 1:36.60 ./osmin0
PR #1622: 17049 jlb 20 0 4404136 516148 147992 S 5,6 1,6 1:37.64 ./osmin1
PR #1622 + Remove upscaling + 9b584878de6bc6eb93c4d740680613dd7fd4e259 17215 jlb 20 0 4273060 360692 148156 S 5,6 1,1 1:36.79 ./osmin2
I let you know after a bench with the mobile device ...
Well, it is fixed now with the PR #1647 .
The root cause is the dpi is factorized 2 times by the screen pixel ratio on HiDPI. Thus causing an uncontrolled increase in the memory used by the image cache.
i.e on modern mobile device the pixel ratio is 3.0 even 4.0. So 3 x (mapDpi = 3 x base dpi) = 9 x the base dpi ==> OOM
Debugging on Xperia 10: the Qt screen is 360 x 768 with a pixel ratio 3.0 , devDPI is 153 => mapDPI = 153 By factoring the dpi with the pixel ratio, the final dpi become 459, which is an HUGE dpi. The rendered tile sizes 240 MB, which is fragmented by 15 pieces of 16 MB to be cached. Therefore the memory allocation has to increase of 480 MB (tile + fragments), then down to 240 MB. Thus that explains the behavior, i.e even trying to force flushing the app needs to allocate 480 MB in 1 pass.
Now fixing that by removing the multiplier, and applying only the up-scale x2.0 to manage the zoom, the rendered tile sizes 64 MB. During fragmentation to fill the cache (16 pieces of 4 MB), the app needs to allocate 128 MB, which is sustainable.
Some bad news. Trying with an older device with arch armv7 and Android 10. The app stills to be killed after few moves and offline tiles computing.
At the moment, the only way for me is to maintain a forked branch of the lib without the PR #1622. See the branch https://github.com/janbar/libosmscout/tree/osmin-1.13 .
Now the root cause has been identified, I close this and will continue in the PR #1647.
Hi @janbar , sorry for long delays. I know that it may be frustrating. But the family and work...
Looking to the code with calculator. With screenPixelRatio=3 and mapDpi=153, finalDpi is then 459, tileDimension is 1224. It is the right tile size for screen with pixel ratio 3, when you do not want to upscale tiles. Do you agree?
To cover whole screen, there should be required just ~10 tiles, so minimum memory usage of cached tiles is ~42 MiB. When apply qNextPowerOfTwo, tile size is increased to 2048, together with RGBA8888 pixel format, memory requirement is ~160 MiB (just for pixel data).
I would say that your https://github.com/Framstag/libosmscout/pull/1647/ is going to the wrong direction... Changes in TiledRenderingHelper are little bit chaotic to me. Instead I would suggest:
- make
screenPixelRatioconfigurable, using current screen pixel ratio as default. That way, you may choose between blurred result and memory consumption - make usage of
qNextPowerOfTwoconfigurable, that way, you may choose between rendering speed and memory consumption. The memory increase between tile size 1224 and 2048 is almost 3 times. - can we use just RGB pixel format instead of
QImage::Format_RGBA8888_Premultiplied?
...usage of RBG888 should be without conversion in QOpenGLTextureCache::bindTexture : https://github.com/Framstag/libosmscout/pull/1648
I didnt realized that we need alpha channel for tiles - when offline tiles are combined with the online one. So, 24bit pixel format (rgb888) may be used just in case when online tiles are disabled.
Another option howto reduce memory would be releasing image data after submit to gpu memory. But I am worry that it is not doable easily with default qt rendering pipeline.
And ultimate solution would be to use opengl renderer, but it is far from complete.
@Karry, I captured tiles bold in the mobile when activating the tiled renderer: The Qt screen size is 360 x 768, and pixel ratio is 3.0 for this Xperia 10. So right the screen size 1080x2304.
The captured tile sizes 4096x6144. Each cached fragment size 2048x2048. The blue line is the area that I see on the screen. The red line is the area matching with the screen resolution (1080x2304).
Do these sizes match what you are planning ?
The following is an original cached fragment (without scaling for github).
... it should be what you planned. The areas with and without up-scaling match. So in my opinion the best compromise is to no longer up-scale, and so no longer try to fix the blurring. Like that it will continue to work without worries.
Yes, it is visible that overhead is huge. But I believe that it is correct - in sense of correctness :-) Tile size for your dpi and 3x upscaling should be 1224, after scaling it to nearest power of two it became 2048. When using tiled renderer, actual area is increased by one tile size horizontally and vertically, to minimize issues with broken labels on tile borders... So, when rendering two tiles at once, you get canvas with 4096x6144 dimension.
Difference between red and blue area is caused by upscaling to next power of two.
I realize that keep sane memory footprint is complex task, with full of compromises between quality, memory consumption and performance. There is no silver bullet and every library user (author of the application) should have the choice what compromise to choose. On mobile, we may prefer smaller memory footprint, but on desktop with 16 GiB of RAM we may prefer quality... For that reason, I would like to have configuration options to control these tradeoffs. Do you agree?
I prepared the the first one, to control scaling to power of two: https://github.com/Framstag/libosmscout/pull/1649
I suggest to implement the flexibility Kerry mentioned in code, too. Strategy pattern comes to my mind. Some interface with a number of implementation that implement rhe various needs or constraints. Default implementation should be on the safe side. People then Can assign and possibly dynamically change the strategy they demand.
Here is configuration option for pixel ratio: https://github.com/Framstag/libosmscout/pull/1650 It may follow the screen value, or may be fixed...
Thanks @Karry , I will try it soon. Also I made the new PR #1651 which avoid copying QImage as possible. This avoids allocating double or triple the memory when caching a tile, and cpu usage is even lower.