ogre icon indicating copy to clipboard operation
ogre copied to clipboard

New sample 'ThreadedResourcePrep' showcasing background preparation of meshes.

Open ohlidalp opened this issue 3 months ago • 11 comments

image

I wanted to see the ResourceBackgroundQueue::prepare() functionality in action before I use it anywhere else - so I created a proof-of concept app in the form of OGRE Sample. I was particularly interested in the common case - loading a mesh, complete with linked material(s) and texture(s).

Opening as draft because of things left to do:

  • [x] Add thumbnail
  • [ ] Add slider (maybe also Apply button) to set number of worker threads
  • [x] Implement threaded prep of textures, too (not doable out of the box as linked materials aren't known until mesh load)
  • [x] Implement threaded prep of skeletons, too (same reason)
  • [ ] Squash the commits together

ohlidalp avatar Sep 25 '25 00:09 ohlidalp

and what kind of improvement could you measure?

paroj avatar Sep 25 '25 13:09 paroj

On my laptop with Ryzen7 (7435HS, 3.10GHz) and nVidia RTX 4070, under Debug build, reloading all meshes at once:

  • Sync prep: 20 reloads = avg loading time 1.062sec
  • Threaded prep: 20 reloads = avg loading time 1.037sec

I think the gain is so minor because textures and skeletons are still prepared synchronously when load()-ing the mesh resource.

ohlidalp avatar Sep 25 '25 22:09 ohlidalp

I've proved my previous assumption. When I also prepare textures & skeletons on background, the stat becomes:

  • 20 reloads = avg loading time 0.233sec

Major win, but required me to hack OgreCore by adding DataStreamPtr Mesh::copyPreparedMeshFileData() and reimplement part of MeshSerializer inside the sample.

ohlidalp avatar Sep 26 '25 03:09 ohlidalp

yeah.. maybe you should just bg load all materials in a resource group instead of that..

paroj avatar Sep 26 '25 16:09 paroj

That would be pretty disappointing though. I'm looking for a supported (non-deprecated) solution for Rigs of Rods which has literally hundreds of user made mods distributed in ZIP archives, most having multiple variants (+multiple skins) inside single ZIP. Bruteforcing prep for all resources just isn't a reasonable solution. Plus, using this approach in a sample would sort of send out a message that threaded prep is technically here but it's not useful for the common case. Frankly my motivation to create this sample was to either find a workaround or point a finger at this issue.

I'm open to suggestions how to tackle this. I noticed there is clone(bool copy, HwBufferMan* newMan) method in both VertexData and IndexData, which made me wonder if Mesh::loadImpl() could be run with a dummy HwBufferMan which would belong to dummy rendersystem that just creates dummy buffers in CPU memory to be cloned to actual rendersystem later. This would most likely show even better results in the benchmark. Another option would be to restore THREAD_SUPPORT 2. I'm a fan of https://preshing.com/20111118/locks-arent-slow-lock-contention-is/ and skimming OGRE source left me with an impression of heavy-handed locking (meaning the developer thought "locks are expensive per se, let's leave each locked as long as possible"). So maybe the arguments in https://github.com/OGRECave/ogre/issues/454 aren't entirely valid.

EDIT: I'll also explore the possibility of using custom loader just to load it the same way, just parsing it on the go. It should save me the copying of the whole stream which Mesh::copyPreparedMeshFileData() currently does.

ohlidalp avatar Sep 26 '25 21:09 ohlidalp

  • arent you loading the mods in RoR into separate resource groups anyway?
  • I dont think loading the mesh and its material in parallel is that beneficial. The bottleneck should be texture loading (especially coming from PNG/ JPG). You can just load the mesh normally and then load its material in bg.
  • the "dummy" HwBufMgr is called DefaultHardwareBufferManager and used inside the LOD system like that. There is also Mesh::setHardwareBufferManager to force its use.

paroj avatar Sep 27 '25 15:09 paroj

• Yes RoR loads every mod to separate RG, so brute prepping all content would be bearable. I will explore the ManualLoader option first, though. • I was under impression loading mesh also synchronously loads the textures, but I was probably wrong. This should do the trick. • Thanks for info, I couldn't make it out from the source.

ohlidalp avatar Sep 27 '25 22:09 ohlidalp

I looked at ManualResourceLoader and I can't figure out how to use it to just prepare meshes. Resource::prepare() calls it and then sets LOADSTATE_PREPARED which is a lie in case of Mesh because mFreshFromDisk cannot be filled from the outside.

@paroj can you advise?

ohlidalp avatar Oct 07 '25 15:10 ohlidalp

you cannot just prepare meshes. The ManualResourceLoader API is made to mainly handle load.

paroj avatar Oct 07 '25 16:10 paroj

I added "Threaded early discovery" mode. To enable it, tick all the checkboxes on top left (to be cleaned up). This means the mesh is prepared via ResourceBackgroundQueue, then a custom task is queued to WorkQueue which:

  1. loads the mesh file again (I reverted the Mesh::copyPreparedMeshFileData()` hack, so I need to load the data myself)
  2. discovers skeletons+textures using custom code
  3. and then queues all of them via ResourceBackgroundQueue.

I expected to see almost no FPS spike with this approach as everything is both fetched to RAM and analyzed on background, only the final mesh loading (which constructs actual hardware buffers) is done on foreground. However, the results are basically identical to just loading the mesh on foreground first and then queuing textures+skeletons via ResourceBackgroundQueue.

ohlidalp avatar Oct 08 '25 22:10 ohlidalp

maybe this is better suited as a VTest, as there is nothing visual here, but rather tests the workflow. VTests are here: https://github.com/OGRECave/ogre/blob/0f01724bb8f25cf8813047253d43867450f5553a/Tests/VisualTests/PlayPen/src/PlayPenTests.cpp#L302

and are executed as part of our CI.

They generate this overview page: https://ogrecave.github.io/ogre/vtests1.12/TestResults_GL.html

however, we do not validate the images but only fail if the test crashes

paroj avatar Oct 09 '25 16:10 paroj