wgpu icon indicating copy to clipboard operation
wgpu copied to clipboard

[wgpu-core] Inline RayQuery Support

Open daniel-keitel opened this issue 1 year ago • 26 comments

Checklist

  • [ ] Run cargo clippy.
  • [ ] Run RUSTFLAGS=--cfg=web_sys_unstable_apis cargo clippy --target wasm32-unknown-unknown if applicable.
  • [ ] Add change to CHANGELOG.md. See simple instructions inside file.

Future Work

  • [ ] Acceleration Structure Compaction
  • [ ] Ray tracing pipelines
  • [ ] Ray tracing resource management

Connections #1040 Depends on: #3507 (vulkan hal) Matrix Room Thread

Description Ray tracing support for wgpu and wgpu-core. This PR implements support for acceleration structures and ray queries.

It provides a create function each for bottom and top level acceleration structures, and a build function for the two. Top level acceleration structures can be used in bind groups. Shader support is already implemented in naga.

image

TODO

  • [x] more validation
  • [ ] as_hal/into_hal
  • [x] tracing
  • [ ] more examples
  • [ ] more tests
  • [ ] documentation
  • [ ] specification
  • [ ] procedural geometry
  • [x] the safe and validated build function

Testing Tested with example application (ray-cube) and test. Needs more tests and examples.

daniel-keitel avatar Apr 01 '23 12:04 daniel-keitel

@daniel-keitel I rebased this branch here: https://github.com/JMS55/wgpu/tree/rt-wgpu-jms

JMS55 avatar Aug 26 '23 03:08 JMS55

Hi, I came from bevy, trying to do ray tracing and I ended up here now. I'd like to understand the problem domain a bit more, I hope to help out but I doubt that I am able to, as I don't really think I have the depth in knowledge to for now but as a start is there a place I can start reading or taking a look at some source points? And I guess as a side question what is the state right now I guess?

gabrielkryss avatar Sep 04 '23 01:09 gabrielkryss

@gabrielkryss hi, I'm the Bevy developer working on RT-based global illumination https://github.com/JMS55/bevy/tree/solari. Here's what needs to be done before Bevy can use RT:

  • First step would be getting #4089 merged
  • Second step would be getting this PR merged. There's a memory leak for TLASs though, and this PR can't be merged anyways until #3626 is merged, which will change a whole lot of the code base.
  • Third step is setting up Bevy to use binding arrays and make the whole scene accessible via a single bind group that can be indexed non-uniformly. This is needed so when a ray hits a point in the scene, you can do things like lookup mesh and material data for that point. I have a (somewhat bad) prototype of this already written as part of bevy_solari, but we'd want a better setup long term. Doing this work will also lead to large performance improvements for Bevy's existing renderer. If you're interested in helping with this, please join the Bevy discord and talk to us in the #rendering-dev channel.
  • Fourth step, which is of course super complicated, is developing the actual global illumination techniques and shader code. I've been working on this as part of bevy_solari. If you're interested in helping with it, again please reach out in the #rendering-dev channel.

Helping out with either developing the WGPU RT APIs, or Bevy's code would go a long ways towards getting RT usable for Bevy. Feel free to ask whatever additional questions you may have.

JMS55 avatar Sep 04 '23 18:09 JMS55

My understanding of this MR is still has issues to work out and is not ready for a review, is that correct?

cwfitzgerald avatar Sep 14 '23 21:09 cwfitzgerald

Yeah. There's a memory leak related to the TLAS unfortunately :/

JMS55 avatar Sep 15 '23 00:09 JMS55

@daniel-keitel @Vecvec I know you wanted to make some changes to the API, but perhaps you can rebase this branch and request reviews for what you have so far? Imo it would be better to merge as-is and iterate later, than putting off any form of RT support for longer. That way people can start using this more without custom forks :)

JMS55 avatar Jan 27 '24 19:01 JMS55

@daniel-keitel @Vecvec I tried rebasing this PR on wgpu-trunk, but ran into some issues with serde. Maybe one of you knows how to fix them. EDIT: Fixed

JMS55 avatar Feb 03 '24 22:02 JMS55

Sorry for a slow response, in reply to your first comment, I think you raise some good points, and I agree that an API re-design should not be attempted in this PR, however I do not think this is quite ready yet, I've been working on an example (re-work of water for rt) and I've noticed a few annoying bugs (e.g building a tlas containing a blas that has been dropped causes a crash) and some convenience functions that could be added (like a untransformed TlasInstance), which make me think this PR may need a bit of polish time before being ready (although this is just my opinion, and I don't know what should block a PR).

Vecvec avatar Feb 05 '24 04:02 Vecvec

@JMS55, @daniel-keitel While working on a new example, I ran into a issue with building that causes a vulkan device lost in a later call (but only if the struct is large - commenting out struct members fixes the problem), I'm trying to figure out the cause (validation layers are not showing any errors), the only thing that I can find is it seems that the vertex buffer is going above 2 mb. If anyone wants to help me debug (I'm not particularly knowledgeable on vulkan) below are the two vulkan API dumps, and the fail is on my project (Vecvec:wgpu/ray-tracing, run wgpu-examples with water_rtx). wgpu_water_rtx_fail.txt wgpu_water_rtx_success.txt

Vecvec avatar Feb 11 '24 03:02 Vecvec

Also ran into an issue: Tlas[Id(0,1,vk)] does not exist

It's fixed if I don't drop the TlasPackage, though. Bind groups need to ensure the TLAS they bind is kept alive.

   0: std::panicking::begin_panic_handler
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\std\src\panicking.rs:645
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library\core\src\panicking.rs:72
   2: wgpu_core::storage::Storage<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >::get<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\storage.rs:110
   3: wgpu_core::track::stateless::StatelessTracker<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >::add_single<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api> >
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\track\stateless.rs:185
   4: wgpu_core::command::compute::impl$6::command_encoder_run_compute_pass_impl::closure$2<wgpu_hal::vulkan::Api>
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\command\compute.rs:543
   5: core::iter::adapters::map::map_fold::closure$0<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,wgpu_core::ray_tracing::TlasAction,tuple$<>,wgpu_core::command::compute::impl$6::command_encoder_run_compute_pass_impl::
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\adapters\map.rs:84
   6: core::iter::traits::iterator::Iterator::fold<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,tuple$<>,core::iter::adapters::map::map_fold::closure_env$0<alloc::s
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\traits\iterator.rs:2640
   7: core::iter::adapters::map::impl$2::fold<wgpu_core::ray_tracing::TlasAction,alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,wgpu_core::command::compute::impl$6::c
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\adapters\map.rs:124
   8: core::iter::traits::iterator::Iterator::for_each<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,wgpu_core::command::compute::impl
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\iter\traits\iterator.rs:858
   9: alloc::vec::Vec<wgpu_core::ray_tracing::TlasAction,alloc::alloc::Global>::extend_trusted<wgpu_core::ray_tracing::TlasAction,alloc::alloc::Global,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\mod.rs:2885
  10: alloc::vec::spec_extend::impl$1::spec_extend<wgpu_core::ray_tracing::TlasAction,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global>,wg
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\spec_extend.rs:26
  11: alloc::vec::impl$18::extend<wgpu_core::ray_tracing::TlasAction,alloc::alloc::Global,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<alloc::sync::Arc<wgpu_core::resource::Tlas<wgpu_hal::vulkan::Api>,alloc::alloc::Global>,alloc::alloc::Global
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\alloc\src\vec\mod.rs:2827
  12: wgpu_core::global::Global::command_encoder_run_compute_pass_impl<wgpu_hal::vulkan::Api>
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\command\compute.rs:550
  13: wgpu_core::global::Global::command_encoder_run_compute_pass<wgpu_hal::vulkan::Api>
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu-core\src\command\compute.rs:348
  14: wgpu::backend::wgpu_core::impl$7::command_encoder_end_compute_pass
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu\src\backend\wgpu_core.rs:1810
  15: wgpu::context::impl$5::command_encoder_end_compute_pass<wgpu::backend::wgpu_core::ContextWgpuCore>
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu\src\context.rs:2795
  16: wgpu::impl$68::drop
             at C:\Users\Jasmine\.cargo\git\checkouts\wgpu-186abbb3d0ea8675\d94cbeb\wgpu\src\lib.rs:4327
  17: core::ptr::drop_in_place<wgpu::ComputePass>
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112\library\core\src\ptr\mod.rs:498

JMS55 avatar Feb 13 '24 04:02 JMS55

I'm not sure keeping the tlas alive is necessary for this issue, I think that line (and the corresponding one in render.rs at line 1516) can be swapped out to

tracker.tlas_s.insert_single(tlas.as_info().id(), tlas.clone());

for some reason though, the other is called blas but is a tlas (probably needs to be renamed), the only issue I see is that the id may be freed when the user facing TLAS is freed, but from the code I think that happens when the wgpu-core resource is dropped.

Vecvec avatar Feb 16 '24 20:02 Vecvec

looking into the Vulkan issue, Nsight runs it unless you disable Device adress C++ support, and if disabled it will warn

ID,,Origin,Source,Message 25,,Target,NVIDIA Nsight Graphics,"Buffer device address was obtained while bound to memory without VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT. If necessary, the Vulkan launch option ""Force Capture/Replay Device Memory"" can avoid this error. Otherwise, C++ capture may not work as expected."

however all memory is allocated with the Device Address flag. This is contradictory, so something else must be going on.

Vecvec avatar Feb 19 '24 05:02 Vecvec

VK_KHR_ray_tracing_position_fetch would be great to add support for: https://www.khronos.org/blog/introducing-vulkan-ray-tracing-position-fetch-extension

JMS55 avatar Feb 24 '24 07:02 JMS55

@JMS55 Yes, though that might need some extra work, I will design an api suggestion and post it on the ray-tracing issue (as it may need a follow up PR).

Vecvec avatar Feb 25 '24 01:02 Vecvec

@Vecvec I talked to the wgpu devs btw. If we want this to be merged, what we need is a bunch more tests. Both for happy paths, and to ensure things can't be misused. Devs will be more willing to review and merge this if they can be confident it's working correctly via tests.

JMS55 avatar Feb 25 '24 02:02 JMS55

I've fixed merge conflicts here, but the tests fail now: https://github.com/daniel-keitel/wgpu/pull/3

atlv24 avatar Apr 14 '24 23:04 atlv24

Tests fail now though, something about resource tracking stack overruns

I've been working on fixing use after free crashes (and merged trunk while doing so), and I've encountered a similar problem. I'm happy to know that it is not just something wrong with the changes I've made. This seems, as you suggest, not related to this PR.

Vecvec avatar Apr 15 '24 18:04 Vecvec

@Vecvec my fails are now just vulkan validation:

[2024-04-14T23:57:23Z ERROR wgpu_hal::vulkan::instance]
    VALIDATION [SYNC-HAZARD-WRITE-AFTER-WRITE (0x5c0ec5d6)]
    Validation Error: [ SYNC-HAZARD-WRITE-AFTER-WRITE ]
    Object 0:
        handle = 0x1fc1e3250e0,
        type = VK_OBJECT_TYPE_QUEUE; 
    | MessageID = 0x5c0ec5d6
    | vkQueueSubmit():  Hazard WRITE_AFTER_WRITE for entry 1,
    VkCommandBuffer 0x1fc2a97b060[],
    Submitted access info (
        submitted_usage: SYNC_COPY_TRANSFER_WRITE,
        command: vkCmdCopyBuffer,
        seq_no: 3,
        reset_no: 1
    ).
    Access info (
        prior_usage: SYNC_COPY_TRANSFER_WRITE,
        write_barriers: SYNC_VERTEX_SHADER_ACCELERATION_STRUCTURE_READ
            | SYNC_FRAGMENT_SHADER_ACCELERATION_STRUCTURE_READ
            | SYNC_COMPUTE_SHADER_ACCELERATION_STRUCTURE_READ
            | SYNC_ACCELERATION_STRUCTURE_BUILD_ACCELERATION_STRUCTURE_READ
            | SYNC_ACCELERATION_STRUCTURE_BUILD_ACCELERATION_STRUCTURE_WRITE,
        queue: VkQueue 0x1fc1e3250e0[],
        submit: 0,
        batch: 0,
        batch_tag: 11,
        command: vkCmdCopyBuffer,
        command_buffer: VkCommandBuffer 0x1fc2a95e730[],
        seq_no: 5,
        reset_no: 1
    ).

[2024-04-14T23:57:23Z ERROR wgpu_hal::vulkan::instance]
    objects: (type: QUEUE, hndl: 0x1fc1e3250e0, name: ?)
    

atlv24 avatar Apr 15 '24 18:04 atlv24

That's great! What did you change? Also I think that that validation error is similar to #5373, same submitted usage though different prior usage.

Edit: actually it has different write barriers, same prior usage. I read the wrong thing. Could this actually be the same issue?

Vecvec avatar Apr 15 '24 20:04 Vecvec

@Vecvec all my changes are pushed up, this was my last commit: https://github.com/daniel-keitel/wgpu/pull/3/commits/67710d7237821774780c2d579fa6264700cde107

We might be missing a barrier somewhere, not sure

atlv24 avatar Apr 15 '24 23:04 atlv24

I've found the issue, turns out we need to call transition buffers on the tlas instance too (seems like a preexisting issue in ray-tracing).

Vecvec avatar Apr 21 '24 20:04 Vecvec

@Vecvec glad you solved it! Perhaps you and @atlv24 want to open a new PR with your combined changes? If we add a bunch of tests and smooth out the bugs, we could try and get this merged.

JMS55 avatar Apr 24 '24 23:04 JMS55

I've opened atlv24/wgpu#1 for the write after write fix.

Vecvec avatar Apr 29 '24 07:04 Vecvec

I've been working on blas compaction, and it now works! Should I split up the PR for merging (a separate hal PR on wgpu plus an extra PR to this PR) or submit as a single PR? Result is (on my system, this probably changes on different graphics cards) 3840 bytes -> 1152 bytes for ray cube compute example.

Vecvec avatar May 26 '24 04:05 Vecvec

Probably best to to make separate PRs. This PR is already huge, and isn't going to be reviewed until it's cleaned up and has more tests. I don't have the skill to finish it though. The best thing you could do would be to cleanup and finish this PR and work with the wgpu devs to get it reviewed, and then do another PR later for BLAS compaction and other improvements. Entirely up to you though, no pressure for how you want to spend your time :)

JMS55 avatar May 26 '24 04:05 JMS55

On another note I've been updating my validation layer, and I noticed a new validation error saying that my graphics card does not support f32x4 as a acceleration struct format, so I've had a look at the format features, something that needs validation in wgpu-core. The formats that have a high enough percentage to allow main-stream ray-tracing (>27%) are f32x2 and f32x3 (I've removed all f16s as rust does not have fp16, and nor does wgpu). I'm not sure if f32x2 is worth it as we would still lose some people still, and I'm not sure 2d ray-tracing is very useful. What we could do is require it to be f32x3 saying that this is required to be f32x3 but there may be features in the future that allow other vertex formats. Limiting the formats we use could increase testing coverage too

Vecvec avatar May 26 '24 05:05 Vecvec