rend3 What to do about running out of resources.

Everything is going great, and then, suddenly, there's a panic in Rend3 due to hitting the limit on the number of vertices in a bind group or filling GPU memory. This needs to be handled more gracefully.

Suggestions:

Error returns for the "add_... " functions. After arcanization, those should be synchronous and callable from any thread, rather than just queuing instructions for the main thread. So, they can return an error if resources are not available. These are the key items, the ones that can use large amounts of memory:

pub fn add_mesh(self: &Arc<Self>, mesh: Mesh) -> Result<MeshHandle, Error>

pub fn add_texture_2d(self: &Arc<Self>, texture: Texture) -> Result<Texture2DHandle, Error>

Some kind of metric to tell users when resources are low. Or at least how much is in use. Worst case, applications allocate until they run out, then try to keep the metric safely below its peak value.
Some way around the bind group size limitation.

Now that Sharpview can display more than one Second Life region, if the user goes to a crowded area, the bind group limit is hit, even though the GPU is only about half full.

Dec 04 '23 05:12 John-Nagle

Tried new Rend3, rev "e1cfe1b". Modified code to detect error returns from mesh, texture creation. Just panics for now. Seems to be basically working.

Many log entries of the form:

05:52:15 [ERROR] AllocationErrorScope dropped without calling end``

which is a message not seen before.

As expected, caught an allocation error with:

=========> Panic Rend3 error: ExceededMaximumBufferSize { max_buffer_size: 2147483648 } at file libscene/src/render/rendutils.rs, line 106 in thread Asset fetch #8.
Backtrace:
 libcommon::common::commonutils::catch_panic::{{closure}}
             at /home/john/projects/sl/SL-test-viewer/libcommon/src/common/commonutils.rs:215:25
 libscene::render::rendutils::convert_rend3_error
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/rendutils.rs:106:43
 libscene::render::rendutils::convert_rend3_handle_result
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/rendutils.rs:114:23
 libscene::render::renderregistry::RenderFaceMeshMapping::new
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/renderregistry.rs:122:35
 libscene::render::renderregistry::RenderFaceMeshGroup::build_render_face_mesh_group

indicating that I hit the bind limit while creating a mesh. As expected.

It would be helpful if the Rend3 errors were Send. They can't be converted into "anyhow" errors, or be copied or cloned, which is inconvenient. It's that "inner" entry that's a WGPU error that causes the trouble, because it's not Send.

Dec 11 '23 06:12 John-Nagle

Subscript out of range error:

06:24:13 [ERROR] Asset request  (AssetRequestTimestamped { request: AssetRequest { content: Mesh(MeshRequest { uuid: 7bdbe80d-6d20-65a2-56fe-9d9d1994e78a }), capability: "http://asset-cdn.glb.agni.lindenlab.com" }, timestamp: Instant { tv_sec: 1378365, tv_nsec: 660984636 } }, 1999998) failed: Mesh load, trouble spot: Region (1808,1199) <67.697105,117.189835,3553.6982>

Caused by:
    ExceededMaximumBufferSize { max_buffer_size: 2147483648 }
06:24:13 [ERROR] Asset request  (AssetRequestTimestamped { request: AssetRequest { content: Mesh(MeshRequest { uuid: 4a881d54-4b74-63e1-c46c-98f8d4265bc8 }), capability: "http://asset-cdn.glb.agni.lindenlab.com" }, timestamp: Instant { tv_sec: 1378363, tv_nsec: 706289265 } }, 999999) failed: Mesh load, trouble spot: Region (1808,1199) <114.52486,129.0136,35.77748>

Caused by:
    ExceededMaximumBufferSize { max_buffer_size: 2147483648 }
06:24:13 [ERROR] =========> Panic index out of bounds: the len is 30438 but the index is 30438 at file /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/managers/mesh.rs, line 213 in thread main.
Backtrace:
 libcommon::common::commonutils::catch_panic::{{closure}}
             at /home/john/projects/sl/SL-test-viewer/libcommon/src/common/commonutils.rs:215:25
 rend3::managers::mesh::MeshManager::remove
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/managers/mesh.rs:213:36
 rend3::renderer::eval::evaluate_instructions
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/renderer/eval.rs:134:21
 rend3::renderer::Renderer::evaluate_instructions
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/renderer/mod.rs:451:9
 <sharpview::AppUi as rend3_framework::App>::handle_event
             at /home/john/projects/sl/SL-test-viewer/sharpview/src/main.rs:555:39
 rend3_framework::async_start::{{closure}}::{{closure}}
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3-framework/src/lib.rs:335:9
 winit::platform_impl::platform::sticky_exit_callback
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/mod.rs:884:9
 winit::platform_impl::platform::x11::EventLoop<T>::run_return::single_iteration
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:375:21
 winit::platform_impl::platform::x11::EventLoop<T>::run_return
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:483:27
 winit::platform_impl::platform::x11::EventLoop<T>::run
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:498:25
06:24:13 [WARN] From (462848,306944), message: MsgObjectUpdateCompressed
06:24:13 [WARN] Error applying compressed update: Object #546594690 in [(462848,306944)] at <-0.220425,0.000397,-0.856916>

So, here, Sharpview kept loading meshes until it hit the limit, then treated that as a non-fatal error and kept going. After a few more requests, there was a subscript out of range in the mesh manager.

Dec 11 '23 06:12 John-Nagle

The issue for AllocationErrorScope dropped without calling end has been fixed on trunk.

Will look into the subscript issue.

Interesting, didn't realize wgpu errors weren't send. That's an issue in wgpu, but can fix it here too.

Dec 11 '23 07:12 cwfitzgerald

Sounds good. The subscript error seems to indicate that repeatedly banging against the limit breaks something.

Now that I can detect out-of-memory errors, I have to do something about them beyond ignoring them. This will take some work. I can switch meshes to a lower level of detail, which was planned anyway and is partly implemented. Thanks for the quick response.

Dec 11 '23 07:12 John-Nagle

Question: if Rend3/WGPU reaches the ExceededMaximumBufferSize, are there internal allocations which will cause crashes? I can get vertex consumption down via the level of detail system, but it may take a second or so for a scan to decide what to remove. Rend3 is still drawing during that period.

I have my own vertex count, and my intent is to reduce my vertex count around 90% of the level that triggered ExceededMaximumBufferSize. I don't want to be hitting the limit constantly. But I have to hit it a few times to discover it.

Dec 11 '23 18:12 John-Nagle

are there internal allocations which will cause crashes?

There shouldn't be. If you encounter any, I'd consider it a bug.

Sounds good. The subscript error seems to indicate that repeatedly banging against the limit breaks something.

I figured it out, should be a simple fix, will apply it once I'm done with work.

Dec 12 '23 00:12 cwfitzgerald

Sounds good. I've sketched out a design for my code for running right up to the limit and then backing off. Non-trivial but can be done.

Dec 12 '23 00:12 John-Nagle

Alright, https://github.com/BVE-Reborn/rend3/pull/539 should have fixed that issue

Dec 12 '23 02:12 cwfitzgerald

rend3 rend3 copied to clipboard

What to do about running out of resources.

rend3
rend3 copied to clipboard