wgpu
wgpu copied to clipboard
[wgpu-core] Inline RayQuery Support
Checklist
- [ ] Run
cargo clippy
. - [ ] Run
RUSTFLAGS=--cfg=web_sys_unstable_apis cargo clippy --target wasm32-unknown-unknown
if applicable. - [ ] Add change to CHANGELOG.md. See simple instructions inside file.
Future Work
- [ ] Acceleration Structure Compaction
- [ ] Ray tracing pipelines
- [ ] Ray tracing resource management
Connections #1040 Depends on: #3507 (vulkan hal) Matrix Room Thread
Description Ray tracing support for wgpu and wgpu-core. This PR implements support for acceleration structures and ray queries.
It provides a create function each for bottom and top level acceleration structures, and a build function for the two. Top level acceleration structures can be used in bind groups. Shader support is already implemented in naga.
TODO
- [x] more validation
- [ ] as_hal/into_hal
- [x] tracing
- [ ] more examples
- [ ] more tests
- [ ] documentation
- [ ] specification
- [ ] procedural geometry
- [x] the safe and validated build function
Testing Tested with example application (ray-cube) and test. Needs more tests and examples.
@daniel-keitel I rebased this branch here: https://github.com/JMS55/wgpu/tree/rt-wgpu-jms
Hi, I came from bevy, trying to do ray tracing and I ended up here now. I'd like to understand the problem domain a bit more, I hope to help out but I doubt that I am able to, as I don't really think I have the depth in knowledge to for now but as a start is there a place I can start reading or taking a look at some source points? And I guess as a side question what is the state right now I guess?
@gabrielkryss hi, I'm the Bevy developer working on RT-based global illumination https://github.com/JMS55/bevy/tree/solari. Here's what needs to be done before Bevy can use RT:
- First step would be getting #4089 merged
- Second step would be getting this PR merged. There's a memory leak for TLASs though, and this PR can't be merged anyways until #3626 is merged, which will change a whole lot of the code base.
- Third step is setting up Bevy to use binding arrays and make the whole scene accessible via a single bind group that can be indexed non-uniformly. This is needed so when a ray hits a point in the scene, you can do things like lookup mesh and material data for that point. I have a (somewhat bad) prototype of this already written as part of bevy_solari, but we'd want a better setup long term. Doing this work will also lead to large performance improvements for Bevy's existing renderer. If you're interested in helping with this, please join the Bevy discord and talk to us in the #rendering-dev channel.
- Fourth step, which is of course super complicated, is developing the actual global illumination techniques and shader code. I've been working on this as part of bevy_solari. If you're interested in helping with it, again please reach out in the #rendering-dev channel.
Helping out with either developing the WGPU RT APIs, or Bevy's code would go a long ways towards getting RT usable for Bevy. Feel free to ask whatever additional questions you may have.
My understanding of this MR is still has issues to work out and is not ready for a review, is that correct?
Yeah. There's a memory leak related to the TLAS unfortunately :/
@daniel-keitel @Vecvec I know you wanted to make some changes to the API, but perhaps you can rebase this branch and request reviews for what you have so far? Imo it would be better to merge as-is and iterate later, than putting off any form of RT support for longer. That way people can start using this more without custom forks :)
@daniel-keitel @Vecvec I tried rebasing this PR on wgpu-trunk, but ran into some issues with serde. Maybe one of you knows how to fix them. EDIT: Fixed
Sorry for a slow response, in reply to your first comment, I think you raise some good points, and I agree that an API re-design should not be attempted in this PR, however I do not think this is quite ready yet, I've been working on an example (re-work of water for rt) and I've noticed a few annoying bugs (e.g building a tlas containing a blas that has been dropped causes a crash) and some convenience functions that could be added (like a untransformed TlasInstance), which make me think this PR may need a bit of polish time before being ready (although this is just my opinion, and I don't know what should block a PR).
@JMS55, @daniel-keitel While working on a new example, I ran into a issue with building that causes a vulkan device lost in a later call (but only if the struct is large - commenting out struct members fixes the problem), I'm trying to figure out the cause (validation layers are not showing any errors), the only thing that I can find is it seems that the vertex buffer is going above 2 mb. If anyone wants to help me debug (I'm not particularly knowledgeable on vulkan) below are the two vulkan API dumps, and the fail is on my project (Vecvec:wgpu/ray-tracing, run wgpu-examples with water_rtx). wgpu_water_rtx_fail.txt wgpu_water_rtx_success.txt
Also ran into an issue:
Tlas[Id(0,1,vk)] does not exist
It's fixed if I don't drop the TlasPackage, though. Bind groups need to ensure the TLAS they bind is kept alive.
0: std::panicking::begin_panic_handler
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:645
1: core::panicking::panic_fmt
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:72
2: wgpu_core::storage::Storage<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >::get<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\storage.rs:110
3: wgpu_core::track::stateless::StatelessTracker<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >::add_single<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\track\stateless.rs:185
4: wgpu_core::command::compute::impl$6::command_encoder_run_compute_pass_impl::closure$2<wgpu_hal::vulkan::Api>
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\command\compute.rs:543
5: core::iter::adapters::map::map_fold::closure$0<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,wgpu_core::ray_tracing::TlasAction,tuple$<>,wgpu_core::command::compute::impl$6::command_encoder_run_compute_pass_impl::
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\adapters\map.rs:84
6: core::iter::traits::iterator::Iterator::fold<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,tuple$<>,core::iter::adapters::map::map_fold::closure_env$0<alloc::s
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\traits\iterator.rs:2640
7: core::iter::adapters::map::impl$2::fold<wgpu_core::ray_tracing::TlasAction,alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,wgpu_core::command::compute::impl$6::c
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\adapters\map.rs:124
8: core::iter::traits::iterator::Iterator::for_each<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,wgpu_core::command::compute::impl
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\traits\iterator.rs:858
9: alloc::vec::Vec<wgpu_core::ray_tracing::TlasAction,alloc::alloc::Global>::extend_trusted<wgpu_core::ray_tracing::TlasAction,alloc::alloc::Global,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\mod.rs:2885
10: alloc::vec::spec_extend::impl$1::spec_extend<wgpu_core::ray_tracing::TlasAction,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,wg
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\spec_extend.rs:26
11: alloc::vec::impl$18::extend<wgpu_core::ray_tracing::TlasAction,alloc::alloc::Global,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\mod.rs:2827
12: wgpu_core::global::Global::command_encoder_run_compute_pass_impl<wgpu_hal::vulkan::Api>
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\command\compute.rs:550
13: wgpu_core::global::Global::command_encoder_run_compute_pass<wgpu_hal::vulkan::Api>
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\command\compute.rs:348
14: wgpu::backend::wgpu_core::impl$7::command_encoder_end_compute_pass
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu\src\backend\wgpu_core.rs:1810
15: wgpu::context::impl$5::command_encoder_end_compute_pass<wgpu::backend::wgpu_core::ContextWgpuCore>
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu\src\context.rs:2795
16: wgpu::impl$68::drop
at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu\src\lib.rs:4327
17: core::ptr::drop_in_place<wgpu::ComputePass>
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\ptr\mod.rs:498
I'm not sure keeping the tlas alive is necessary for this issue, I think that line (and the corresponding one in render.rs at line 1516) can be swapped out to
tracker.tlas_s.insert_single(tlas.as_info().id(), tlas.clone());
for some reason though, the other is called blas but is a tlas (probably needs to be renamed), the only issue I see is that the id may be freed when the user facing TLAS is freed, but from the code I think that happens when the wgpu-core resource is dropped.
looking into the Vulkan issue, Nsight runs it unless you disable Device adress C++ support, and if disabled it will warn
ID,,Origin,Source,Message 25,,Target,NVIDIA Nsight Graphics,"Buffer device address was obtained while bound to memory without VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT. If necessary, the Vulkan launch option ""Force Capture/Replay Device Memory"" can avoid this error. Otherwise, C++ capture may not work as expected."
however all memory is allocated with the Device Address flag. This is contradictory, so something else must be going on.
VK_KHR_ray_tracing_position_fetch would be great to add support for: https://www.khronos.org/blog/introducing-vulkan-ray-tracing-position-fetch-extension
@JMS55 Yes, though that might need some extra work, I will design an api suggestion and post it on the ray-tracing issue (as it may need a follow up PR).
@Vecvec I talked to the wgpu devs btw. If we want this to be merged, what we need is a bunch more tests. Both for happy paths, and to ensure things can't be misused. Devs will be more willing to review and merge this if they can be confident it's working correctly via tests.
I've fixed merge conflicts here, but the tests fail now: https://github.com/daniel-keitel/wgpu/pull/3
Tests fail now though, something about resource tracking stack overruns
I've been working on fixing use after free crashes (and merged trunk while doing so), and I've encountered a similar problem. I'm happy to know that it is not just something wrong with the changes I've made. This seems, as you suggest, not related to this PR.
@Vecvec my fails are now just vulkan validation:
[2024-04-14T23:57:23Z ERROR wgpu_hal::vulkan::instance]
VALIDATION [SYNC-HAZARD-WRITE-AFTER-WRITE (0x5c0ec5d6)]
Validation Error: [ SYNC-HAZARD-WRITE-AFTER-WRITE ]
Object 0:
handle = 0x1fc1e3250e0,
type = VK_OBJECT_TYPE_QUEUE;
| MessageID = 0x5c0ec5d6
| vkQueueSubmit(): Hazard WRITE_AFTER_WRITE for entry 1,
VkCommandBuffer 0x1fc2a97b060[],
Submitted access info (
submitted_usage: SYNC_COPY_TRANSFER_WRITE,
command: vkCmdCopyBuffer,
seq_no: 3,
reset_no: 1
).
Access info (
prior_usage: SYNC_COPY_TRANSFER_WRITE,
write_barriers: SYNC_VERTEX_SHADER_ACCELERATION_STRUCTURE_READ
| SYNC_FRAGMENT_SHADER_ACCELERATION_STRUCTURE_READ
| SYNC_COMPUTE_SHADER_ACCELERATION_STRUCTURE_READ
| SYNC_ACCELERATION_STRUCTURE_BUILD_ACCELERATION_STRUCTURE_READ
| SYNC_ACCELERATION_STRUCTURE_BUILD_ACCELERATION_STRUCTURE_WRITE,
queue: VkQueue 0x1fc1e3250e0[],
submit: 0,
batch: 0,
batch_tag: 11,
command: vkCmdCopyBuffer,
command_buffer: VkCommandBuffer 0x1fc2a95e730[],
seq_no: 5,
reset_no: 1
).
[2024-04-14T23:57:23Z ERROR wgpu_hal::vulkan::instance]
objects: (type: QUEUE, hndl: 0x1fc1e3250e0, name: ?)
That's great! What did you change? Also I think that that validation error is similar to #5373, same submitted usage though different prior usage.
Edit: actually it has different write barriers, same prior usage. I read the wrong thing. Could this actually be the same issue?
@Vecvec all my changes are pushed up, this was my last commit: https://github.com/daniel-keitel/wgpu/pull/3/commits/67710d7237821774780c2d579fa6264700cde107
We might be missing a barrier somewhere, not sure
I've found the issue, turns out we need to call transition buffers on the tlas instance too (seems like a preexisting issue in ray-tracing).
@Vecvec glad you solved it! Perhaps you and @atlv24 want to open a new PR with your combined changes? If we add a bunch of tests and smooth out the bugs, we could try and get this merged.
I've opened atlv24/wgpu#1 for the write after write fix.
I've been working on blas compaction, and it now works! Should I split up the PR for merging (a separate hal PR on wgpu plus an extra PR to this PR) or submit as a single PR? Result is (on my system, this probably changes on different graphics cards) 3840 bytes -> 1152 bytes for ray cube compute example.
Probably best to to make separate PRs. This PR is already huge, and isn't going to be reviewed until it's cleaned up and has more tests. I don't have the skill to finish it though. The best thing you could do would be to cleanup and finish this PR and work with the wgpu devs to get it reviewed, and then do another PR later for BLAS compaction and other improvements. Entirely up to you though, no pressure for how you want to spend your time :)
On another note I've been updating my validation layer, and I noticed a new validation error saying that my graphics card does not support f32x4 as a acceleration struct format, so I've had a look at the format features, something that needs validation in wgpu-core. The formats that have a high enough percentage to allow main-stream ray-tracing (>27%) are f32x2 and f32x3 (I've removed all f16s as rust does not have fp16, and nor does wgpu). I'm not sure if f32x2 is worth it as we would still lose some people still, and I'm not sure 2d ray-tracing is very useful. What we could do is require it to be f32x3 saying that this is required to be f32x3 but there may be features in the future that allow other vertex formats
. Limiting the formats we use could increase testing coverage too