libosmscout icon indicating copy to clipboard operation
libosmscout copied to clipboard

Facing an OOM using tiled rendering with Qt

Open janbar opened this issue 8 months ago • 32 comments

I am facing an OOM using tiled rendering with Qt. Finally the app is killed by the crazy lowmemorykiller when used memory is up to 1.1~1.2 GB. So I stuck to use the commit a15448b5422d0f81cccf8947bdca127cfb454296, where the issue doesn't exist. No more investigation yet, but using plan rendering the issue is not.

@Karry , I guess it could be a cache issue as no longer control. Running with valgrind doesn't help me.

janbar avatar Apr 04 '25 10:04 janbar

It seems the memory footprint of the app is bigger after this commit. Offline Tiled cache is freed as I can see by switching the renderer mode to plan. For the story, from months I am in debugging phase facing with many issues and bugs with the latest Android sdk that should support C++20. It is a misery ;-).

janbar avatar Apr 04 '25 16:04 janbar

It seems the memory footprint of the app is bigger after this commit.

It is kind of expected. Just keep in mind that HiDPI support was rewritten later: https://github.com/Framstag/libosmscout/pull/1622 , but in practice effect is the same - rendered tiles are twice as big (1.5^2) to get smooth result. So, visual cache occupies twice of memory. By default, it should be just around less than 100 MiB for each MapWidget component (on regular mobile screen), but all osmscout caches may be huge... For that reason, application may release them explicitly, when system memory is under pressure....

And it leads me to the question - do Android have some API to notify user applications that memory is under the pressure and applications should release its caches?

Karry avatar Apr 04 '25 21:04 Karry

do Android have some API to notify user applications that memory is under the pressure and applications should release its caches?

Yes it has (API Memory Advice). But how I could control the flush or hint a purge of cached data ?

Issue occurs using one widget, the main screen. I tried reducing the offline cache size to 50MB, but to no avail. The memory allocation is elsewhere.

I could step commit by commit to try to identify the one, where allocation grow. But more than one major change can impact the behavior: c++20 features , map features etc. Currently version 1.13.5 works well (a15448b5422d0f81cccf8947bdca127cfb454296) with patch for standard C++17. Testing 5b0456373318db3b310c39f54a0db6afaf5af08f -> OOM Testing 07e27ba8d45bd6f0043b374e9abb5e6f633708ea -> OOM

On Android the "lowmemorykiller" is very aggressive !

janbar avatar Apr 04 '25 22:04 janbar

In Sailfish app, I created service that check memory usage periodically (because api for memory notifications is broken) and when memory is under pressure, call DBThread::FlushCaches. From DBThread is the signal propagated to all places that may release the memory...

Karry avatar Apr 05 '25 06:04 Karry

Beside that, I call malloc_trim to return memory from C++ allocator back to the kernel... It may impact performance, but it is handy, when you really need to release memory back to the system.

Karry avatar Apr 05 '25 06:04 Karry

Thanks @Karry , you put me back on the right solution. Using Android memory adviser to trig a flush could close the issue on Android, but not others. Your choice fixes the issue on all platforms. This morning, after a good coffee, I launched a routing simulation with two apps in parallel made with commits a15448b5422d0f81cccf8947bdca127cfb454296 and 5b0456373318db3b310c39f54a0db6afaf5af08f. The result is clear:

Image

I take 120MB of additional memory allocation with the latest commit, which explains the OOM. Now still to implement your solution to flush ...

janbar avatar Apr 05 '25 09:04 janbar

Continuing investigations, I isolated the PR responsive for the growing of allocation: PR #1622. It seems some "back buffers", cannot be managed, are involved and increase quickly the used memory. By plugging a memory manager which try to flush on OS events (Android Memory Adviser), doesn't fix the issue because allocation increases too fast.

As example, running the app for 10 minutes on a distance of 5 km, it will be killed because allocations increases fast to exceed 1GB.

So I reverted this PR, and issue doesn't longer exist: memory allocation remain stable around 350MB. During my last trip, I traveled 300 km without any problems. (see https://github.com/janbar/libosmscout/tree/ffd19d55e13b33ce2a5d4a36624846e78770d504).

I noticed too using QImage instead QPixmap, the cpu usage increases to 5%. So I reverted all commits of the PR.

janbar avatar Apr 12 '25 16:04 janbar

I noticed too using QImage instead QPixmap, the cpu usage increases to 5%. So I reverted all commits of the PR.

Wow... It really should not happened. When you look to Qt's QOpenGLTextureCache code (I checked old Qt 5 and the latest), there is additional conversion when using QPixmap, but when QImage with proper pixel format and size is used, it is send to graphic memory without any conversion. So, using QImage should be faster, and I was able to confirm it with profiler on Xperia 10 II and Linux desktop... Can you dig deeper please?

With memory usage, we can make the upscale configurable, the image tile size is trade-off between smooth result on HiDPI displays and memory footage...

Karry avatar Apr 14 '25 07:04 Karry

@Karry , sorry I make mistake. Using Qimage instead QPixmap improves memory usage, and that doesn't impact the cpu usage. CPU usage increases when applying all commits of the PR #1622. This could be due to an increased number of page in/out.

I tested again including the commit "Use QImage" (https://github.com/janbar/libosmscout/commit/9c316bc8c5ea496fb618389ad1ccbc6f8219bc6d).

janbar avatar Apr 14 '25 20:04 janbar

What are your screen dimensions (reported by Qt) and HiDPI upscale? You may also try to lower OSMScoutQt::onlineTileCacheSize and OSMScoutQt::offlineTileCacheSize during library setup, by OSMScoutQtBuilder::WithTileCacheSizes. You just need to keep cache size big enough to cover whole screen...

Karry avatar Apr 14 '25 22:04 Karry

The tile caches settings are already configured. On device mobile the offline tile cache size is 100MB. Reducing it has no effect on the OOM.

The screen config reported by Qt is the following: 04-15 18:58:50.966 7699 7720 W osmin : ##### pixel ratio = 3.000000 04-15 18:58:50.966 7699 7720 W osmin : ##### size = 360 x 768

The commit https://github.com/janbar/libosmscout/commit/9b584878de6bc6eb93c4d740680613dd7fd4e259 works as expected without issue. From the PR #1622 there is a bad guy that pump the memory.

janbar avatar Apr 15 '25 17:04 janbar

@Karry , I found the culprit.

janbar avatar Apr 16 '25 17:04 janbar

Tell us :-)

Framstag avatar Apr 16 '25 18:04 Framstag

The culprit is the up-scaling of tiles.

In commit 07e27ba8d45bd6f0043b374e9abb5e6f633708ea double finalDpi = mapDpi * 2 * this->screenPixelRatio;

Fixed/Updated by next commit d721e811da67839160e23c7af34e16d642ef4c64

double finalDpi = mapDpi * this->screenPixelRatio;
...
tileDimension = qNextPowerOfTwo(tileDimension - 1);
finalDpi = (double(tileDimension) / double(OSMTile::osmTileOriginalWidth())) * OSMTile::tileDPI();

But removing it, doesn't fix the initial issue about blurred tiles. My original commit 9b584878de6bc6eb93c4d740680613dd7fd4e259 fixed it by an other way without doubling the memory usage. I guess a solution is to mix the two PR: removing up-scaling and keeping my fix.

I will make some tests trying to mix the 2 ways without up-scale the tiles.

janbar avatar Apr 16 '25 19:04 janbar

Also I suspect a bug in the Qt backend, some times the issue doesn't occur and memory keep under control, and some times it increases continuously like a leak or something not purged.

Or a bug in caches flushing ... is a dead-lock can occur calling flushCaches() ?

janbar avatar Apr 16 '25 20:04 janbar

The bench for 3 versions:

Rollback PR #1622: fixes the OOM 12701 jlb 20 0 4199284 357016 144764 S 2,0 1,1 1:36.60 ./osmin0

PR #1622: 17049 jlb 20 0 4404136 516148 147992 S 5,6 1,6 1:37.64 ./osmin1

PR #1622 + Remove upscaling + 9b584878de6bc6eb93c4d740680613dd7fd4e259 17215 jlb 20 0 4273060 360692 148156 S 5,6 1,1 1:36.79 ./osmin2

I let you know after a bench with the mobile device ...

janbar avatar Apr 17 '25 11:04 janbar

Well, it is fixed now with the PR #1647 .

The root cause is the dpi is factorized 2 times by the screen pixel ratio on HiDPI. Thus causing an uncontrolled increase in the memory used by the image cache.

i.e on modern mobile device the pixel ratio is 3.0 even 4.0. So 3 x (mapDpi = 3 x base dpi) = 9 x the base dpi ==> OOM

Debugging on Xperia 10: the Qt screen is 360 x 768 with a pixel ratio 3.0 , devDPI is 153 => mapDPI = 153 By factoring the dpi with the pixel ratio, the final dpi become 459, which is an HUGE dpi. The rendered tile sizes 240 MB, which is fragmented by 15 pieces of 16 MB to be cached. Therefore the memory allocation has to increase of 480 MB (tile + fragments), then down to 240 MB. Thus that explains the behavior, i.e even trying to force flushing the app needs to allocate 480 MB in 1 pass.

Now fixing that by removing the multiplier, and applying only the up-scale x2.0 to manage the zoom, the rendered tile sizes 64 MB. During fragmentation to fill the cache (16 pieces of 4 MB), the app needs to allocate 128 MB, which is sustainable.

janbar avatar Apr 17 '25 22:04 janbar

Some bad news. Trying with an older device with arch armv7 and Android 10. The app stills to be killed after few moves and offline tiles computing.

At the moment, the only way for me is to maintain a forked branch of the lib without the PR #1622. See the branch https://github.com/janbar/libosmscout/tree/osmin-1.13 .

janbar avatar Apr 20 '25 08:04 janbar

Now the root cause has been identified, I close this and will continue in the PR #1647.

janbar avatar Apr 21 '25 21:04 janbar

Hi @janbar , sorry for long delays. I know that it may be frustrating. But the family and work...

Looking to the code with calculator. With screenPixelRatio=3 and mapDpi=153, finalDpi is then 459, tileDimension is 1224. It is the right tile size for screen with pixel ratio 3, when you do not want to upscale tiles. Do you agree?

To cover whole screen, there should be required just ~10 tiles, so minimum memory usage of cached tiles is ~42 MiB. When apply qNextPowerOfTwo, tile size is increased to 2048, together with RGBA8888 pixel format, memory requirement is ~160 MiB (just for pixel data).

I would say that your https://github.com/Framstag/libosmscout/pull/1647/ is going to the wrong direction... Changes in TiledRenderingHelper are little bit chaotic to me. Instead I would suggest:

  • make screenPixelRatio configurable, using current screen pixel ratio as default. That way, you may choose between blurred result and memory consumption
  • make usage of qNextPowerOfTwo configurable, that way, you may choose between rendering speed and memory consumption. The memory increase between tile size 1224 and 2048 is almost 3 times.
  • can we use just RGB pixel format instead of QImage::Format_RGBA8888_Premultiplied ?

Karry avatar Apr 21 '25 22:04 Karry

...usage of RBG888 should be without conversion in QOpenGLTextureCache::bindTexture : https://github.com/Framstag/libosmscout/pull/1648

Karry avatar Apr 21 '25 22:04 Karry

I didnt realized that we need alpha channel for tiles - when offline tiles are combined with the online one. So, 24bit pixel format (rgb888) may be used just in case when online tiles are disabled.

Karry avatar Apr 22 '25 06:04 Karry

Another option howto reduce memory would be releasing image data after submit to gpu memory. But I am worry that it is not doable easily with default qt rendering pipeline.

And ultimate solution would be to use opengl renderer, but it is far from complete.

Karry avatar Apr 22 '25 07:04 Karry

@Karry, I captured tiles bold in the mobile when activating the tiled renderer: The Qt screen size is 360 x 768, and pixel ratio is 3.0 for this Xperia 10. So right the screen size 1080x2304.

The captured tile sizes 4096x6144. Each cached fragment size 2048x2048. The blue line is the area that I see on the screen. The red line is the area matching with the screen resolution (1080x2304).

Image

Do these sizes match what you are planning ?

janbar avatar Apr 22 '25 10:04 janbar

The following is an original cached fragment (without scaling for github). Image

janbar avatar Apr 22 '25 10:04 janbar

... it should be what you planned. The areas with and without up-scaling match. So in my opinion the best compromise is to no longer up-scale, and so no longer try to fix the blurring. Like that it will continue to work without worries.

janbar avatar Apr 22 '25 11:04 janbar

Yes, it is visible that overhead is huge. But I believe that it is correct - in sense of correctness :-) Tile size for your dpi and 3x upscaling should be 1224, after scaling it to nearest power of two it became 2048. When using tiled renderer, actual area is increased by one tile size horizontally and vertically, to minimize issues with broken labels on tile borders... So, when rendering two tiles at once, you get canvas with 4096x6144 dimension.

Difference between red and blue area is caused by upscaling to next power of two.

I realize that keep sane memory footprint is complex task, with full of compromises between quality, memory consumption and performance. There is no silver bullet and every library user (author of the application) should have the choice what compromise to choose. On mobile, we may prefer smaller memory footprint, but on desktop with 16 GiB of RAM we may prefer quality... For that reason, I would like to have configuration options to control these tradeoffs. Do you agree?

I prepared the the first one, to control scaling to power of two: https://github.com/Framstag/libosmscout/pull/1649

Karry avatar Apr 22 '25 22:04 Karry

I suggest to implement the flexibility Kerry mentioned in code, too. Strategy pattern comes to my mind. Some interface with a number of implementation that implement rhe various needs or constraints. Default implementation should be on the safe side. People then Can assign and possibly dynamically change the strategy they demand.

Framstag avatar Apr 23 '25 06:04 Framstag

Here is configuration option for pixel ratio: https://github.com/Framstag/libosmscout/pull/1650 It may follow the screen value, or may be fixed...

Karry avatar Apr 25 '25 07:04 Karry

Thanks @Karry , I will try it soon. Also I made the new PR #1651 which avoid copying QImage as possible. This avoids allocating double or triple the memory when caching a tile, and cpu usage is even lower.

janbar avatar Apr 25 '25 18:04 janbar