rend3 icon indicating copy to clipboard operation
rend3 copied to clipboard

What to do about running out of resources.

Open John-Nagle opened this issue 1 year ago • 8 comments

Everything is going great, and then, suddenly, there's a panic in Rend3 due to hitting the limit on the number of vertices in a bind group or filling GPU memory. This needs to be handled more gracefully.

Suggestions:

  • Error returns for the "add_... " functions. After arcanization, those should be synchronous and callable from any thread, rather than just queuing instructions for the main thread. So, they can return an error if resources are not available. These are the key items, the ones that can use large amounts of memory:

pub fn add_mesh(self: &Arc<Self>, mesh: Mesh) -> Result<MeshHandle, Error>

pub fn add_texture_2d(self: &Arc<Self>, texture: Texture) -> Result<Texture2DHandle, Error>

Now that Sharpview can display more than one Second Life region, if the user goes to a crowded area, the bind group limit is hit, even though the GPU is only about half full.

John-Nagle avatar Dec 04 '23 05:12 John-Nagle

Tried new Rend3, rev "e1cfe1b". Modified code to detect error returns from mesh, texture creation. Just panics for now. Seems to be basically working.

Many log entries of the form:

05:52:15 [ERROR] AllocationErrorScope dropped without calling end``

which is a message not seen before.

As expected, caught an allocation error with:

=========> Panic Rend3 error: ExceededMaximumBufferSize { max_buffer_size: 2147483648 } at file libscene/src/render/rendutils.rs, line 106 in thread Asset fetch #8.
Backtrace:
 libcommon::common::commonutils::catch_panic::{{closure}}
             at /home/john/projects/sl/SL-test-viewer/libcommon/src/common/commonutils.rs:215:25
 libscene::render::rendutils::convert_rend3_error
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/rendutils.rs:106:43
 libscene::render::rendutils::convert_rend3_handle_result
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/rendutils.rs:114:23
 libscene::render::renderregistry::RenderFaceMeshMapping::new
             at /home/john/projects/sl/SL-test-viewer/libscene/src/render/renderregistry.rs:122:35
 libscene::render::renderregistry::RenderFaceMeshGroup::build_render_face_mesh_group

indicating that I hit the bind limit while creating a mesh. As expected.

It would be helpful if the Rend3 errors were Send. They can't be converted into "anyhow" errors, or be copied or cloned, which is inconvenient. It's that "inner" entry that's a WGPU error that causes the trouble, because it's not Send.

John-Nagle avatar Dec 11 '23 06:12 John-Nagle

Subscript out of range error:

06:24:13 [ERROR] Asset request  (AssetRequestTimestamped { request: AssetRequest { content: Mesh(MeshRequest { uuid: 7bdbe80d-6d20-65a2-56fe-9d9d1994e78a }), capability: "http://asset-cdn.glb.agni.lindenlab.com" }, timestamp: Instant { tv_sec: 1378365, tv_nsec: 660984636 } }, 1999998) failed: Mesh load, trouble spot: Region (1808,1199) <67.697105,117.189835,3553.6982>

Caused by:
    ExceededMaximumBufferSize { max_buffer_size: 2147483648 }
06:24:13 [ERROR] Asset request  (AssetRequestTimestamped { request: AssetRequest { content: Mesh(MeshRequest { uuid: 4a881d54-4b74-63e1-c46c-98f8d4265bc8 }), capability: "http://asset-cdn.glb.agni.lindenlab.com" }, timestamp: Instant { tv_sec: 1378363, tv_nsec: 706289265 } }, 999999) failed: Mesh load, trouble spot: Region (1808,1199) <114.52486,129.0136,35.77748>

Caused by:
    ExceededMaximumBufferSize { max_buffer_size: 2147483648 }
06:24:13 [ERROR] =========> Panic index out of bounds: the len is 30438 but the index is 30438 at file /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/managers/mesh.rs, line 213 in thread main.
Backtrace:
 libcommon::common::commonutils::catch_panic::{{closure}}
             at /home/john/projects/sl/SL-test-viewer/libcommon/src/common/commonutils.rs:215:25
 rend3::managers::mesh::MeshManager::remove
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/managers/mesh.rs:213:36
 rend3::renderer::eval::evaluate_instructions
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/renderer/eval.rs:134:21
 rend3::renderer::Renderer::evaluate_instructions
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3/src/renderer/mod.rs:451:9
 <sharpview::AppUi as rend3_framework::App>::handle_event
             at /home/john/projects/sl/SL-test-viewer/sharpview/src/main.rs:555:39
 rend3_framework::async_start::{{closure}}::{{closure}}
             at /home/john/.cargo/git/checkouts/rend3-e03f89403de3386a/e1cfe1b/rend3-framework/src/lib.rs:335:9
 winit::platform_impl::platform::sticky_exit_callback
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/mod.rs:884:9
 winit::platform_impl::platform::x11::EventLoop<T>::run_return::single_iteration
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:375:21
 winit::platform_impl::platform::x11::EventLoop<T>::run_return
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:483:27
 winit::platform_impl::platform::x11::EventLoop<T>::run
             at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.28.7/src/platform_impl/linux/x11/mod.rs:498:25
06:24:13 [WARN] From (462848,306944), message: MsgObjectUpdateCompressed
06:24:13 [WARN] Error applying compressed update: Object #546594690 in [(462848,306944)] at <-0.220425,0.000397,-0.856916>

So, here, Sharpview kept loading meshes until it hit the limit, then treated that as a non-fatal error and kept going. After a few more requests, there was a subscript out of range in the mesh manager.

John-Nagle avatar Dec 11 '23 06:12 John-Nagle

The issue for AllocationErrorScope dropped without calling end has been fixed on trunk.

Will look into the subscript issue.

Interesting, didn't realize wgpu errors weren't send. That's an issue in wgpu, but can fix it here too.

cwfitzgerald avatar Dec 11 '23 07:12 cwfitzgerald

Sounds good. The subscript error seems to indicate that repeatedly banging against the limit breaks something.

Now that I can detect out-of-memory errors, I have to do something about them beyond ignoring them. This will take some work. I can switch meshes to a lower level of detail, which was planned anyway and is partly implemented. Thanks for the quick response.

John-Nagle avatar Dec 11 '23 07:12 John-Nagle

Question: if Rend3/WGPU reaches the ExceededMaximumBufferSize, are there internal allocations which will cause crashes? I can get vertex consumption down via the level of detail system, but it may take a second or so for a scan to decide what to remove. Rend3 is still drawing during that period.

I have my own vertex count, and my intent is to reduce my vertex count around 90% of the level that triggered ExceededMaximumBufferSize. I don't want to be hitting the limit constantly. But I have to hit it a few times to discover it.

John-Nagle avatar Dec 11 '23 18:12 John-Nagle

are there internal allocations which will cause crashes?

There shouldn't be. If you encounter any, I'd consider it a bug.

Sounds good. The subscript error seems to indicate that repeatedly banging against the limit breaks something.

I figured it out, should be a simple fix, will apply it once I'm done with work.

cwfitzgerald avatar Dec 12 '23 00:12 cwfitzgerald

Sounds good. I've sketched out a design for my code for running right up to the limit and then backing off. Non-trivial but can be done.

John-Nagle avatar Dec 12 '23 00:12 John-Nagle

Alright, https://github.com/BVE-Reborn/rend3/pull/539 should have fixed that issue

cwfitzgerald avatar Dec 12 '23 02:12 cwfitzgerald