wgpu icon indicating copy to clipboard operation
wgpu copied to clipboard

Ray Tracing Support

Open 5nefarious opened this issue 4 years ago • 27 comments
trafficstars

Khronos has recently released the final specification for ray tracing on Vulkan. At this point, DX12, Vulkan, and Metal all seem to have some form of acceleration for ray tracing. Are there any plans to eventually consolidate and expose these APIs in wgpu and wgpu-rs?

I foresee real-time rendering making increasing use of ray tracing in the future, so this may be an essential feature to have. However, I imagine it may be very difficult to support this on the DX11 and OpenGL backends through software emulation.

5nefarious avatar Nov 23 '20 16:11 5nefarious

See also:

  • https://github.com/gfx-rs/wgpu-rs/issues/247 (dupe in wgpu-rs)
  • https://github.com/gfx-rs/gfx/issues/2418 (graphics abstraction we use)
  • https://github.com/gpuweb/gpuweb/issues/535 (Web upstream feature request)

It's too early to consider this an essential feature on all backends, but rolling it out at least where the backends do have support for it seems very useful today.

kvark avatar Nov 23 '20 16:11 kvark

Are there any updates on this?

Sebbl0508 avatar Jan 21 '22 20:01 Sebbl0508

Not currently. There's interest from upstream WebGPU, but that would be well after v1.

We could make it a native extension and I think it would be great, but the apis are quite large (including shader side transformations), and given very few of us actually have hardware to support this, the chance of it happening any time soon without a champion is low.

cwfitzgerald avatar Jan 21 '22 20:01 cwfitzgerald

Hi, when is this comming out?

OriginLive avatar Aug 28 '22 00:08 OriginLive

There's no one actively working on it to my knowledge.

cwfitzgerald avatar Aug 28 '22 01:08 cwfitzgerald

It would be nice to atleast get native extension or some bindings. Would be a nice start

OriginLive avatar Aug 28 '22 02:08 OriginLive

I'm curious about getting basic ray-tracing support working. I think that a first draft wouldn't be too hard, with some limitations. I think we'd initially want to:

  • Focus on Vulkan
  • Ignore the full ray tracing pipeline (https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_tracing_pipeline.html) and just focus on ray queries (https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_query.html)
  • Ignore support in naga for now, just support ray tracing in SPIR-V via SPIRV_SHADER_PASSTHROUGH.

The remaining part is then creating an abstraction for acceleration structures (https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_acceleration_structure.html).

You probably want to focus on getting a simple acceleration structure abstraction for triangle mesh and instance ASes that are built on the GPU. Anything else can come later.

I can't promise anything, but I'll try and look into this when I'm back with my home setup.

expenses avatar Aug 29 '22 10:08 expenses

My progress so far is at https://github.com/gfx-rs/wgpu/compare/master...expenses:wgpu:hal-acceleration-structures. I plan on merging in wgpu-hal support first, probably with a 'hello world' ray-traced triangle example like https://github.com/SaschaWillems/Vulkan#basic-ray-tracing.

expenses avatar Sep 12 '22 10:09 expenses

@expenses thank you for championing Ray Tracing! It's very exciting for the community to get access to it 🚀 .

Our main concerns are the maintenance costs for a large API surface added to the fact it will need to change in order to abstract over DXR efficiently. Ideally, there would be a proper investigation on the API differences between VkRT and DXR before wgpu-hal API is prototyped. However, we are somewhat confident that the amount of changes needed for DXR will be limited, and it would be fine to do as the next step.

My advice, if I may, would be to not try to copy Vulkan into wgpu-hal. Our HAL is low level and zero/low overhead, but it doesn't have to be extremely low level. For example, it doesn't have the API for allocating memory and binding objects to it, like Vulkan. So if you have a choice of 1) put complexity in the Vulkan backend, or 2) expose it in wgpu-hal, please put a bigger weight on 1). We can make it more complex as a follow-up for DXR if needed.

Again, thank you for all the amazing contributions. Looking forward to see the opportunities that your work opens to all of us!

kvark avatar Sep 12 '22 17:09 kvark

My impression is that as Vulkan raytracing was based on DXR, the APIs should be fairly similar. Beyond acceleration structures, the main thing I've focused on is the ray query extension: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_query.html. This allows ray tracing in the normal vertex/fragment/compute shader stages as opposed to requiring a new ray tracing shader stage and shader binding tables and all that stuff.

This is equivalent to inline ray tracing in DXR 1.1: https://devblogs.microsoft.com/directx/dxr-1-1/#inline-raytracing

expenses avatar Sep 12 '22 23:09 expenses

My impression is that as Vulkan raytracing was based on DXR, the APIs should be fairly similar. Beyond acceleration structures, the main thing I've focused on is the ray query extension: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_query.html. This allows ray tracing in the normal vertex/fragment/compute shader stages as opposed to requiring a new ray tracing shader stage and shader binding tables and all that stuff.

This is equivalent to inline ray tracing in DXR 1.1: https://devblogs.microsoft.com/directx/dxr-1-1/#inline-raytracing

Looking at the example. The setup on rust part is insane. 1k lines for raytracing few triangles. And unsafe blocks?

trsh avatar Sep 28 '22 07:09 trsh

Looking at the example. The setup on rust part is insane. 1k lines for raytracing few triangles. And unsafe blocks?

@trsh it's a wgpu-hal example, not a main wgpu one :) The halmark example is just as long.

expenses avatar Sep 28 '22 09:09 expenses

Hello. I am a bit confused o.o. Is it in the roadmap to support hardware-accelerated raytracing (that nvidia RTX cards or AMD's 6000-radeon series support)? Is it still a consideration? How far away is stable raytracing support? (I am guessing a year atleast based on the thread but I might be completely off)

coder3112 avatar Oct 13 '22 20:10 coder3112

Hello!

Is it in the roadmap to support hardware-accelerated raytracing (that nvidia RTX cards or AMD's 6000-radeon series support)?

Yes. This issue is for DXR, VK_KHR_ray_tracing support.

How far away is stable raytracing support?

There's no set timeframe, just people working on it when they are able. There is a lot of components that go into raytracing, so full wgpu will take a while. Some small first steps have been made (like implementing RT in wgpu-hal with no shader support)

cwfitzgerald avatar Oct 13 '22 22:10 cwfitzgerald

Thank you for the response!

coder3112 avatar Oct 16 '22 20:10 coder3112

If you want to play a bit with compute shaders and raytracing to create visibility buffer to the apply your shading, I've played a bit with the idea in my raytracing_visibility shader 😄

https://github.com/gents83/INOX/blob/master/data_raw/shaders/wgsl/raytracing_visibility.wgsl

gents83 avatar Jan 17 '23 18:01 gents83

@expenses I'm adding ray query support to WGSL in https://github.com/gfx-rs/naga/pull/2256 You could try this instead of forcing raw SPIR-V usage.

kvark avatar Feb 22 '23 18:02 kvark

I have continued to work @expenses first implementation. (now PR #3507) Currently I am looking into the DirectX12 Specification, so the hal Api doesn't need to change when implementing DX at a later point.

The Specifications (Dx,Vk) are fairly compatible in regards to acceleration structures, but I see no way to build acceleration structures indirectly with DX12.

Can anyone confirm if DX12 is unable to build acceleration structures indirectly? (What I found: ExecuteIndirect)

daniel-keitel avatar Feb 25 '23 06:02 daniel-keitel

pGeometryDescs is a CPU based parameter as opposed to InstanceDescs which live on the GPU is, at least for initial implementations, the CPU needs to look at some of the information such as triangle counts in pGeometryDescs in order to schedule acceleration structure builds. Perhaps in the future more of the data can live on the GPU

Definitely sounds like there is no indirect.

cwfitzgerald avatar Feb 25 '23 09:02 cwfitzgerald

@cwfitzgerald thanks for the assurance, (quite odd, that it is impossible). I will ignore vkCmdBuildAccelerationStructuresIndirectKHR for now.

daniel-keitel avatar Feb 25 '23 10:02 daniel-keitel

The initial implementation #3507 (Vulkan, Ray Query) should be finished now, and is awaiting review.

I started to implement ray-tracing pipelines in wgpu-hal for Vulkan in #3607 in the meantime. A simple example works already.

daniel-keitel avatar Mar 21 '23 16:03 daniel-keitel

My Proposal for the api in wgpu:

// Ray tracing api proposal for wgpu (underlining Vulkan, Metal and DX12 implementations)

// The general design goal is to come up with an simpler Api, which allows for validation.
// Since this validation would have high overhead in some cases, 
// I decided to provide more limited functions and unsafe functions for the same action, to evade this tradeoff.  

// Error handling and traits like Debug are omitted. 

// Core structures with no public members
pub struct Blas {}
pub struct Tlas {}
pub struct BlasRequirements {}
pub struct TlasInstances{}

// Size descriptors used to describe the size requirements of blas geometries.
// Also used internally for strict validation
pub struct BlasTriangleGeometrySizeDescriptor{
    pub vertex_format: wgt::VertexFormat,
    pub vertex_count: u32,
    pub index_format: Option<wgt::IndexFormat>,
    pub index_count: Option<u32>,
    pub flags: AccelerationStructureGeometryFlags,
}

pub struct BlasProceduralGeometrySizeDescriptor{
    pub count: u32,
    pub flags: AccelerationStructureGeometryFlags,
} 

// Procedural geometry contains AABBs
pub struct BlasProceduralGeometry{
    pub size: BlasTriangleGeometrySize,
    pub bounding_box_buffer: Buffer,
    pub bounding_box_buffer_offset: wgt::BufferAddress,
    pub bounding_box_stride: wgt::BufferAddress,
}

// Triangle Geometry contains vertices, optionally indices and transforms 
pub struct BlasTriangleGeometry{
    pub size: BlasTriangleGeometrySize,
    pub vertex_buffer: Buffer
    pub first_vertex: u32,
    pub vertex_stride: wgt::BufferAddress,
    pub index_buffer: Option<Buffer>,
    pub index_buffer_offset: Option<wgt::BufferAddress>,
    pub transform_buffer: Option<Buffer>,
    pub transform_buffer_offset: Option<wgt::BufferAddress>,
}

// Build flags 
pub struct AccelerationStructureFlags{
    // build_speed, small_size, ...
}

// Geometry flags
pub struct AccelerationStructureGeometryFlags{
    // opaque, no_duplicate_any_hit, ...
}

// Descriptors used to determine the memory requirements and validation of a acceleration structure 
pub enum BlasGeometrySizeDescriptors{
    Triangles{desc: Vec<BlasTriangleGeometrySizeDescriptor>},
    Procedural(desc: Vec<BlasProceduralGeometrySize>) 
}

// With prefer update, we decide if an update is possible, else we rebuild.
// Maybe a force update option could be useful
pub enum UpdateMode{
    Build,
    // Update,
    PreferUpdate,
}

// General descriptor for the size requirements, 
// since the required size depends on the contents and build flags 
pub struct GetBlasRequirementsDescriptor{
    pub flags: AccelerationStructureFlags,
}

// Creation descriptors, we provide flags, and update_mode.
// We store it in the structure, so we don't need to pass it every build.
pub struct CreateBlasDescriptor<'a>{
    pub requirements: &'a BlasRequirements
    pub flags: AccelerationStructureFlags,
    pub update_mode: UpdateMode,
}

pub struct CreateTlasDescriptor{
    pub max_instances: u32,
    pub flags: AccelerationStructureFlags,
    pub update_mode: UpdateMode,
}

// Secure instance entry for tlas
struct TlasInstance{
    transform: [f32; 12],
    custom_index: u32,
    mask: u8,
    shader_binding_table_record_offset: u32,
    flags: u8 //bitmap
    blas: Blas
}

impl Device {
    // Retrieves the size requirements for an acceleration structure.
    // BlasRequirements stores the BlasGeometrySizeDescriptors for validation (thats why we take ownership)
    // These descriptors are required for strict validation, because the underling (e.g. Vulkan) specifications doesn't
    // make many guaranties about the ordering of size requirements between different geometries.
    // By storing the sizes of all geometries we can validate that the different list of geometries is guarantied to fit.
    // If we would just query if the requirement are satisfied for the new geometries, it may fit on some systems and not others.  
    pub fn get_blas_size_requirements(&self, desc: &GetBlasRequirementsDescriptor, entries: BlasGeometrySizeDescriptors) -> BlasRequirements;
    
    // Creates a new bottom level accelerations structures and sets internal states for validation(reference to BlasGeometrySizeDescriptors)
    // and builds (e.g update mode)
    pub fn create_blas(&self, desc: &CreateBlasDescriptor) -> Blas;

    // Creates a new top level accelerations structures and sets internal states for builds (e.g update mode)
    pub fn create_tlas(&self, desc: &CreateTlasDescriptor) -> Tlas;
}

// Enum for the different types of geometries inside a single blas build
enum BlasGeometries<'a>{
    TriangleGeometries(&'a [BlasTriangleGeometry])
    ProceduralGeometries(&'a [BlasProceduralGeometry])
}

impl CommandEncoder {
    // Build multiple bottom level acceleration structures.
    // Validates that the geometries are guarantied to produce an acceleration structure that fits inside the allocated buffers (with strict validation).
    // Ensures that all used buffers are valid and synchronized.
    pub fn build_blas<'a>(&self, blas: impl IntoIterator<Item=&'a Blas>,
        triangle_geometries: impl IntoIterator<Item=&'a [BlasGeometries]>,
        scratch_buffers: impl IntoIterator<Item=&'a Buffer>);

    // Build multiple top level acceleration structures.
    // Validates the instances, (e.g. ensures that blas entries are valid and synchronized)
    // Uploads The part of the instances that changed in a staging buffer and 
    // enqueues a command to copy from that staging buffer into the internal index buffer.  
    // (Splitting building of bottom level and top level, makes validation easier). 
    pub fn build_tlas(&self, tlas: impl IntoIterator<Item=&'a Tlas>,
        instances: impl IntoIterator<Item=&'a TlasInstances>,
        scratch_buffers: impl IntoIterator<Item=&'a Buffer>);

    // Build multiple top level acceleration structures.
    // Uses the provided instance buffer directly (minimal validation).
    pub unsafe fn build_tlas_unsafe(&self, tlas: impl IntoIterator<Item=&'a Tlas>,
        raw_instances: impl IntoIterator<Item=&'a Buffer>,
        scratch_buffers: impl IntoIterator<Item=&'a Buffer>);

    // Creates a new blas and copies (in a compacting way) the contents of the provided blas
    // into the new one (compaction flag must be set). 
    pub fn compact_blas(&self, blas: &Blas) -> Blas;
}

impl BlasRequirements {
    // To use the same acceleration structure for multiple blas build (after each build we compact into a new one)
    // we need to allocate buffers big enough for all.
    // This function allows this, in a safe way (with strict validation enabled).
    pub fn find_smallest_shared_requirements(requirements: &[BlasRequirements]) -> BlasRequirements;

    // getter for the required scratch_buffer_size
    pub fn required_scratch_buffer_size() -> BufferAddress;
}

// trait on blas/tlas
trait AccelerationStructure {
    // modify flags before a build
    pub fn set_flags_mode(mode: AccelerationStructureFlags);
    // modify the update mode before a build
    pub fn set_update_mode(mode: UpdateMode);
    // getter for the required scratch_buffer_size
    pub fn required_scratch_buffer_size() -> BufferAddress;
}

// Safe Tlas Instance
impl TlasInstances{
    pub fn new(max_instances: u32) -> Self;

    // gets instances to read from
    pub fn get(&self) -> &[TlasInstance];
    // gets instances to modify, we keep track of the range to determine what we need to validate and copy
    pub fn get_mut_range(&mut self, range: Range<u32>) -> &mut [TlasInstance];
    // get the number of instances which will be build
    pub fn active(&self) -> u32;
    // set the number of instances which will be build
    pub fn set_active(&mut self, active: u32);
}

Previous discussion in a wgpu matrix room thread

daniel-keitel avatar Mar 25 '23 17:03 daniel-keitel

Thank you for coming up with a concrete proposal! Is there any way we can reduce the API surface here? it would increase the chances for it to make into an WebGPU extension. For example, would it be reasonable to have the first version of this API not supporting the "update" operation for acceleration structures at all? Without the update, managing the scratch buffers may be simplified - just allocate it when creating an acceleration structure, and then free when it's built, all internally.

kvark avatar Mar 26 '23 06:03 kvark

I don't think that we can sacrifice updates. I agree that, for a minimal wgpu-core api surface (with full functionality), the scratch buffers shouldn't be exposed. We could allocate a new scratch-buffer for each build and free it afterwards (similar to staging-buffers). Another part where we can reduce the api surface is at blas creation. If we accept that each acceleration structure creation results in a new allocation (which also makes the api simpler to use), a separate function to calculate the size requirements is not necessary. With these two changes combined, only 4 public functions are neccessary in wgpu-core.

// Structs and enums stay mostly the same

impl Device {
    // Creates a new bottom level accelerations structures and sets internal states for validation and builds (e.g. update mode)
    pub fn create_blas(&self, desc: &CreateBlasDescriptor, entries: BlasGeometrySizeDescriptors) -> Blas;

    // Creates a new top level accelerations structures and sets internal states for builds (e.g update mode)
    pub fn create_tlas(&self, desc: &CreateTlasDescriptor) -> Tlas;
}

// Enum for the different types of geometries inside a single blas build
// [Should we used nested iterators with dynamic dispatch instead] 
enum BlasGeometries<'a>{
    TriangleGeometries(&'a [BlasTriangleGeometry])
    ProceduralGeometries(&'a [BlasProceduralGeometry])
}

impl CommandEncoder {
    // Build acceleration structures.
    // Elements of blas may be used in a tlas (internal synchronization).
    // This function will allocate a single big scratch buffer, that is shared between internal builds.
    // If there are to many acceleration structures for a single build (size constraint),
    // we will automatically distribute them between multiple internal builds. (reducing the required size of the scratch buffer).
    // This version will be implemented in wgpu::util not wgpu-core. (may change later)
    pub fn build_acceleration_structures<'a>(&self,
        blas: impl IntoIterator<Item=(&'a Blas,BlasGeometries<'a>)>,
        tlas: impl IntoIterator<Item=(&'a Tlas,TlasInstances<'a>)>,
    );

    // unsafe version without validation for tlas, directly using an instance buffer.
    // u32 for the number of instances to build
    pub fn build_acceleration_structures_unsafe_tlas<'a>(&self,
        blas: impl IntoIterator<Item=(&'a Blas,BlasGeometries<'a>)>,
        tlas: impl IntoIterator<Item=(&'a Tlas,&'a Buffer, u32)>,
    );

    // Creates a new blas and copies (in a compacting way) the contents of the provided blas
    // into the new one (compaction flag must be set). 
    pub fn compact_blas(&self, blas: &Blas) -> Blas;
}

// Safe Tlas Instance stays the same

I will start with a minimal implementation to get a better feel for wgpu-core.

daniel-keitel avatar Mar 26 '23 23:03 daniel-keitel

First "working" implementation: #3631 Specification close to the minimal api surface described above (safe build_acceleration_structures function not yet implemented).

daniel-keitel avatar Apr 01 '23 12:04 daniel-keitel

@JMS55 suggested

VK_KHR_ray_tracing_position_fetch would be great to add support for: https://www.khronos.org/blog/introducing-vulkan-ray-tracing-position-fetch-extension

so this is my suggested api: this will need an extra feature as less than half the percent of ray-tracing supported gpus (VK_KHR_ray_query at 9.83% -> VK_KHR_ray_tracing_position_fetch at 3.24%) support this. One possible name for this feature could be RAY_HIT_VERTEX_POSITION, I prefer this to Vulkan's name (VK_KHR_ray_tracing_position_fetch) as fetch seems to mean it requires keeping the buffer alive. Since vulkan also wants this when creating the acceleration structure, there would need to be a similar flag, and in the shader there would also be a corresponding

enable: ray_hit_vertex_position;

which would add to the pre-existing RayIntersection struct

    vertex_positions: array<vec3<f32>, 3>,

Looking at the naga code this proposal seems implementable, but I don't know how good it is, so I would like suggestions.

Vecvec avatar Feb 25 '24 06:02 Vecvec

  • It has to be a separate feature (sadly), as I don't think DX12/Metal support it either
  • TLAS is required to be built with the VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_DATA_ACCESS_KHR flag. wgpu/naga is going to have to be able to validate that when you get the vertex positions, the TLAS you traced against had that flag enabled. Maybe a new acceleration_structure_extended binding type, where naga only allows position fetches on that specific kind of TLAS, and then wgpu validates at bind time that you enabled the flag on the bound TLAS?
  • Rather than building it into the RayIntersection struct, it might make more sense to keep it as a separate function that maps directly to OpRayQueryGetIntersectionTriangleVertexPositionsKHR. Idk.

JMS55 avatar Feb 25 '24 07:02 JMS55

I'm surprised that it only is on vulkan, it feels like since they are already loading the positions it should be easy, however if it just works on vulkan then I probably will not implement it as on webgpu it says

A proposal for new functionality must be implementable on at least 2 different native APIs

Vecvec avatar Feb 26 '24 05:02 Vecvec

@Vecvec I would note, that wgpu is willing to accept features that are only supported on a single api - the main thing is that if multiple apis support it, the proposal should allow implementation on all.

cwfitzgerald avatar Feb 26 '24 08:02 cwfitzgerald

Thanks! I had assumed wgpu had similar proposal policies as webgpu. In that case in response to @JMS55, your first idea sounds good but,

var acc_struct: acceleration_structure_extended;

may confuse people as to what feature / flag to enable for it though, instead how about

var<get_position> acc_struct: acceleration_structure;

(similar to uniform / storage buffers) or

var acc_struct: position_getting_acceleration_structure; 

similar to storage textures (a flag for any texture) instead.

In response to your second idea, I think that it is better than mine, because my idea would require all inputted acceleration structures to have the get_position flag and so would be too restrictive, how about

HitVertexPositions(rq: &ray_query) -> array<vec3f, 3>

for the function.

Vecvec avatar Feb 27 '24 06:02 Vecvec