Nabla icon indicating copy to clipboard operation
Nabla copied to clipboard

Expose Raytracing Pipeline

Open Erfan-Ahmadi opened this issue 2 years ago • 33 comments

This is a documentation of the vulkan objects that are part of VK_KHR_ray_tracing_pipeline we may want to expose and work with in Nabla.

  • Click on the links if I don't explain something enough.
  • Please pay attention to the verbs used: AS Build and AS Creation are different things.
  • vkCmd* is related to device operations while vk* is related to host operations (eg. vkCmdBuildAccelerationStructuresKHR and vkBuildAccelerationStructuresKHR)

Extension and Properties

VkPhysicalDeviceRayTracingPipelinePropertiesKHR

structure is included in the pNext chain of the VkPhysicalDeviceProperties2 structure passed to vkGetPhysicalDeviceProperties2, it is filled in with each corresponding implementation-dependent property.

VkPhysicalDeviceRayTracingPipelineFeaturesKHR :

structure is included in the pNext chain of the VkPhysicalDeviceFeatures2 structure passed to vkGetPhysicalDeviceFeatures2

These are needed for:

  1. ShaderGroup opaque handle's memory management (e.g shaderGroupHandleSize shaderGroupBaseAlignment, maxShaderGroupStride, shaderGroupHandleAlignment)
  2. Some value's max/min to validate (e.g maxRayHitAttributeSize maxRayRecursionDepth maxRayDispatchInvocationCount)

We must also expose some of these physical device values for the use to work with; If user wants to do everything low-level and manually (eg creating ShaderBindingTable Buffer)

VkPhysicalDeviceAccelerationStructureFeaturesKHR:

It has accelerationStructureHostCommands that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR)

Just put Cmd after vk to make the functions above device functions instead of host ones.

Acceleration Structures (Creation, Build, Compaction, Copy, Related Enums and Structs...)

Geometry:

VkAccelerationStructureGeometryKHR:

typedef struct VkAccelerationStructureGeometryKHR {
    VkStructureType                           sType;
    const void*                               pNext;
    VkGeometryTypeKHR                         geometryType;
    VkAccelerationStructureGeometryDataKHR    geometry;
    VkGeometryFlagsKHR                        flags;
} VkAccelerationStructureGeometryKHR;

geometry member is a union of three other structs:

  1. Triangles specifies a geometry type consisting of triangles (used when building blas from vertex buffer)
  2. AABBs geometry type consisting of axis-aligned bounding boxes. (used when working with custom primitives that need a custom Intersection shader)
  3. Instances a geometry type consisting of acceleration structure instances. (used when building tlas frmo blas instances)

1. VkAccelerationStructureGeometryTrianglesDataKHR:

    VkStructureType                  sType;
    const void*                      pNext;
    VkFormat                         vertexFormat;
    VkDeviceOrHostAddressConstKHR    vertexData;
    VkDeviceSize                     vertexStride;
    uint32_t                         maxVertex;
    VkIndexType                      indexType;
    VkDeviceOrHostAddressConstKHR    indexData;
    VkDeviceOrHostAddressConstKHR    transformData;

2. VkAccelerationStructureGeometryAabbsDataKHR:

    VkStructureType                  sType;
    const void*                      pNext;
    VkDeviceOrHostAddressConstKHR    data;
    VkDeviceSize                     stride;
  • data is a device or host address to memory containing VkAabbPositionsKHR structures containing position data for each axis-aligned bounding box in the geometry.

VkAabbPositionsKHR:

    float    minX;
    float    minY;
    float    minZ;
    float    maxX;
    float    maxY;
    float    maxZ;

3. VkAccelerationStructureGeometryInstancesDataKHR:

    VkStructureType                  sType;
    const void*                      pNext;
    VkBool32                         arrayOfPointers;
    VkDeviceOrHostAddressConstKHR    data;
  • data is either the address of an array of device or host addresses referencing individual VkAccelerationStructureInstanceKHR structures or packed motion instance information as described in motion instances if arrayOfPointers is VK_TRUE, or the address of an array of VkAccelerationStructureInstanceKHR or VkAccelerationStructureMotionInstanceNV structures. Addresses and VkAccelerationStructureInstanceKHR structures are tightly packed. VkAccelerationStructureMotionInstanceNV have a stride of 160 bytes.

VkAccelerationStructureInstanceKHR:

typedef struct VkAccelerationStructureInstanceKHR {
    VkTransformMatrixKHR          transform;
    uint32_t                      instanceCustomIndex:24;
    uint32_t                      mask:8;
    uint32_t                      instanceShaderBindingTableRecordOffset:24;
    VkGeometryInstanceFlagsKHR    flags:8;
    uint64_t                      accelerationStructureReference;
} VkAccelerationStructureInstanceKHR;

accelerationStructureReference is either:

  • a device address containing the value obtained from vkGetAccelerationStructureDeviceAddressKHR or vkGetAccelerationStructureHandleNV (used by device operations which reference acceleration structures) or,
  • a VkAccelerationStructureKHR object (used by host operations which reference acceleration structures).

VkGeometryTypeKHR:

    VK_GEOMETRY_TYPE_TRIANGLES_KHR = 0,
    VK_GEOMETRY_TYPE_AABBS_KHR = 1,
    VK_GEOMETRY_TYPE_INSTANCES_KHR = 2,

VkGeometryFlagBitsKHR:

    VK_GEOMETRY_OPAQUE_BIT_KHR = 0x00000001,
    VK_GEOMETRY_NO_DUPLICATE_ANY_HIT_INVOCATION_BIT_KHR = 0x00000002,

Acceleration Structures are built ~~created~~ from:

  1. One or more geometries (VkAccelerationStructureGeometryKHR ) filled in a VkAccelerationStructureBuildGeometryInfoKHR (referenced later in this text)
  2. And for each geometry we should have a build range ( VkAccelerationStructureBuildRangeInfoKHR)

VkAccelerationStructureBuildRangeInfoKHR:

    uint32_t    primitiveCount;
    uint32_t    primitiveOffset;
    uint32_t    firstVertex;
    uint32_t    transformOffset;
  • The relation between geometry.triangles and BuildRangeInfo is similar to the relation between vertexBuffer+inputAttributes and parameters of vkCmdDraw

In the case of triangle geometry, primitiveCount is the number of triangles.

VkAccelerationStructureBuildGeometryInfoKHR:

    VkStructureType                                     sType;
    const void*                                         pNext;
    VkAccelerationStructureTypeKHR                      type;
    VkBuildAccelerationStructureFlagsKHR                flags;
    VkBuildAccelerationStructureModeKHR                 mode;
    VkAccelerationStructureKHR                          srcAccelerationStructure;
    VkAccelerationStructureKHR                          dstAccelerationStructure;
    uint32_t                                            geometryCount;
    const VkAccelerationStructureGeometryKHR*           pGeometries;
    const VkAccelerationStructureGeometryKHR* const*    ppGeometries;
    VkDeviceOrHostAddressKHR                            scratchData;

Most of the members are clear enough and explained in the spec, there is only a few notes:

  • scratchData is the temprory scratchData for Vulkan to work with when It's building the AS.
  • ScratchData Size is queried using vkGetAccelerationStructureBuildSizesKHR
  • The enums used are referenced below.

vkGetAccelerationStructureBuildSizesKHR:

void vkGetAccelerationStructureBuildSizesKHR(
    VkDevice                                    device,
    VkAccelerationStructureBuildTypeKHR         buildType,
    const VkAccelerationStructureBuildGeometryInfoKHR* pBuildInfo,
    const uint32_t*                             pMaxPrimitiveCounts,
    VkAccelerationStructureBuildSizesInfoKHR*   pSizeInfo);
  • pBuildInfo might be partially or fully filled
  • pMaxPrimitiveCounts is the maximum number of primitives for each geometry in the pBuildInfos.pp
  • You might want to set pMaxPrimitiveCounts of each geometry the exact same value set for their respective VkAccelerationStructureBuildRangeInfoKHR's primitiveCount

VkAccelerationStructureBuildSizesInfoKHR:

    VkStructureType    sType;
    const void*        pNext;
    VkDeviceSize       accelerationStructureSize;
    VkDeviceSize       updateScratchSize;
    VkDeviceSize       buildScratchSize;

We should usually call this function before Creating our AS becuase sizeInfo contains sizeInfo.accelerationStructureSize We should usually call this function before Building our AS becuase sizeInfo contains sizeInfo.buildScratchSize We should usually call this function before Updating our AS becuase sizeInfo contains sizeInfo.updateScratchSize

After Querying the sizes using vkGetAccelerationStructureBuildSizesKHR we must:

  1. Create the scratch buffer with ( size = sizeInfo.buildScratchSize)
  2. Continue to fill our buildInfo: buildInfo.scratchData.deviceAddress = scratchAddress;

IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and call vkGetBufferDeviceAddress behind the scenes.

Host/Device Commands for Build, CopyAStoAS, CopyASToMemory, CopyMemoryToAS, WriteProperties

Note for Host Commands:

  • They use Deferred Operations so they could be "joined" in users thread for more CPU utilization, We could expose these Deferred Operations (VkDeferredOperationKHR) Creation and Join function or we could simply use that internally and call join and wait for VK_SUCCESS and finish the task in the appropriate function in the current calling thread without exposing Deferred Operations.

Build AS

All function parameters are explained above

vkCmdBuildAccelerationStructuresKHR:

void vkCmdBuildAccelerationStructuresKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);
  • Read Vulkan Spec for Correct PipelineStages and AccessTypes for Memory Barriers of the Memory Involved in Build Command

vkCmdBuildAccelerationStructuresIndirectKHR:

void vkCmdBuildAccelerationStructuresIndirectKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkDeviceAddress*                      pIndirectDeviceAddresses,
    const uint32_t*                             pIndirectStrides,
    const uint32_t* const*                      ppMaxPrimitiveCounts);
  • pIndirectDeviceAddresses is a pointer to an array of infoCount buffer device addresses which point to pInfos[i].geometryCount VkAccelerationStructureBuildRangeInfoKHR structures defining dynamic offsets to the addresses where geometry data is stored, as defined by pInfos[i].

  • Accesses to any element of pIndirectDeviceAddresses must be synchronized with the VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR pipeline stage and an access type of VK_ACCESS_INDIRECT_COMMAND_READ_BIT.

vkBuildAccelerationStructuresKHR:

VkResult vkBuildAccelerationStructuresKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    uint32_t                                    infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);

Write Properties

vkCmdWriteAccelerationStructuresPropertiesKHR:

void vkCmdWriteAccelerationStructuresPropertiesKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    accelerationStructureCount,
    const VkAccelerationStructureKHR*           pAccelerationStructures,
    VkQueryType                                 queryType,
    VkQueryPool                                 queryPool,
    uint32_t                                    firstQuery);

Note for write properties:

  1. We could expose a QueryType and QueryPool interface to the user but that would be very Vulkan Specific.
  2. We can also have functions for each QueryType and store QueryPool somewhere internal without exposing the interface to user. I suggest we go with 2. for example : QueryAccelerationStructuresCompactionSizes(....)

There is no need for QueryPool for the respective Host Operation

vkWriteAccelerationStructuresPropertiesKHR:

VkResult vkWriteAccelerationStructuresPropertiesKHR(
    VkDevice                                    device,
    uint32_t                                    accelerationStructureCount,
    const VkAccelerationStructureKHR*           pAccelerationStructures,
    VkQueryType                                 queryType,
    size_t                                      dataSize,
    void*                                       pData,
    size_t                                      stride);

Copy AS to AS

Example usage is when copying AS to CompactedAS

vkCmdCopyAccelerationStructureKHR:

void vkCmdCopyAccelerationStructureKHR(
    VkCommandBuffer                             commandBuffer,
    const VkCopyAccelerationStructureInfoKHR*   pInfo);

vkCopyAccelerationStructureKHR:

VkResult vkCopyAccelerationStructureKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    const VkCopyAccelerationStructureInfoKHR*   pInfo);

VkCopyAccelerationStructureInfoKHR:

    VkStructureType                       sType;
    const void*                           pNext;
    VkAccelerationStructureKHR            src;
    VkAccelerationStructureKHR            dst;
    VkCopyAccelerationStructureModeKHR    mode;

Important note for memory barriers

  • Accesses to pInfo->src and pInfo->dst must be synchronized with the VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR pipeline stage and an access type of VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR or VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR as appropriate.

VkCopyAccelerationStructureModeKHR:

    VK_COPY_ACCELERATION_STRUCTURE_MODE_CLONE_KHR = 0,
    VK_COPY_ACCELERATION_STRUCTURE_MODE_COMPACT_KHR = 1,
    VK_COPY_ACCELERATION_STRUCTURE_MODE_SERIALIZE_KHR = 2,
    VK_COPY_ACCELERATION_STRUCTURE_MODE_DESERIALIZE_KHR = 3,

Copy AS To Memory

vkCmdCopyAccelerationStructureToMemoryKHR:

void vkCmdCopyAccelerationStructureToMemoryKHR(
    VkCommandBuffer                             commandBuffer,
    const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);

vkCopyAccelerationStructureToMemoryKHR:

VkResult vkCopyAccelerationStructureToMemoryKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);

VkCopyAccelerationStructureToMemoryInfoKHR:

    VkStructureType                       sType;
    const void*                           pNext;
    VkAccelerationStructureKHR            src;
    VkDeviceOrHostAddressKHR              dst;
    VkCopyAccelerationStructureModeKHR    mode;

Copy Memory To AS

vkCopyMemoryToAccelerationStructureKHR:

VkResult vkCopyMemoryToAccelerationStructureKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);

vkCmdCopyMemoryToAccelerationStructureKHR:

void vkCmdCopyMemoryToAccelerationStructureKHR(
    VkCommandBuffer                             commandBuffer,
    const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);

VkCopyMemoryToAccelerationStructureInfoKHR:

    VkStructureType                       sType;
    const void*                           pNext;
    VkDeviceOrHostAddressConstKHR         src;
    VkAccelerationStructureKHR            dst;
    VkCopyAccelerationStructureModeKHR    mode;

Comptability Check

To check if a serialized acceleration structure is compatible with the current device call. We need a function to use these functions and structs for the compatibility Check.

Creating AS

VkAccelerationStructureCreateInfoKHR :

    VkStructureType                          sType;
    const void*                              pNext;
    VkAccelerationStructureCreateFlagsKHR    createFlags;
    VkBuffer                                 buffer;
    VkDeviceSize                             offset;
    VkDeviceSize                             size;
    VkAccelerationStructureTypeKHR           type;
    VkDeviceAddress                          deviceAddress;
  • deviceAddress in these function parameters is related to accelerationStructureCaptureReplay and this optional functionality is intended to be used by tools and not by applications directly.

  • createInfo.buffer is a buffer allocated most likely with size of sizeInfo.accelerationStructureSize (See VkAccelerationStructureBuildSizesInfoKHR above)

Enums Used

VkAccelerationStructureTypeKHR:

VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR = 0,
VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR = 1,
VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR = 2,

VkDeviceOrHostAddressConstKHR:

typedef union VkDeviceOrHostAddressConstKHR {
VkDeviceAddress deviceAddress;
const void* hostAddress;
} VkDeviceOrHostAddressConstKHR;

Fill hostAddress when working with host side acceleration structure and fill in deviceAddress otherwise. Exposing this is a matter of choice, function could also take different inputs that might not need DeviceOrHostAddressConst (also has a non-const version) I suggest exposing it as an struct.

VkBuildAccelerationStructureModeKHR:

VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR = 0,
VK_BUILD_ACCELERATION_STRUCTURE_MODE_UPDATE_KHR = 1,

We also could write different build/update AS functions.

VkBuildAccelerationStructureFlagBitsKHR:

VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR = 0x00000001,
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR = 0x00000002,
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR = 0x00000004,
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR = 0x00000008,
VK_BUILD_ACCELERATION_STRUCTURE_LOW_MEMORY_BIT_KHR = 0x00000010,

VkAccelerationStructureBuildTypeKHR:

    VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR = 0,
    VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR = 1,
    VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_OR_DEVICE_KHR = 2,

VkAccelerationStructureCreateFlagBitsKHR:

    VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR = 0x00000001,
  // Provided by VK_NV_ray_tracing_motion_blur
    VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV = 0x00000004,

VkGeometryInstanceFlagBitsKHR:

    VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR = 0x00000001,
    VK_GEOMETRY_INSTANCE_TRIANGLE_FLIP_FACING_BIT_KHR = 0x00000002,
    VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR = 0x00000004,
    VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR = 0x00000008,
    VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_KHR = VK_GEOMETRY_INSTANCE_TRIANGLE_FLIP_FACING_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_TRIANGLE_CULL_DISABLE_BIT_NV = VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_NV = VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_NV = VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR,
  // Provided by VK_NV_ray_tracing
    VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_NV = VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR,

Deferred Operations

(Fill if needed to expose)

RayTracing Pipeline

vkCreateRayTracingPipelinesKHR:

VkResult vkCreateRayTracingPipelinesKHR(
    VkDevice                                    device,
    VkDeferredOperationKHR                      deferredOperation,
    VkPipelineCache                             pipelineCache,
    uint32_t                                    createInfoCount,
    const VkRayTracingPipelineCreateInfoKHR*    pCreateInfos,
    const VkAllocationCallbacks*                pAllocator,
    VkPipeline*                                 pPipelines);
  • Note: Also takes deferredOperation

VkRayTracingPipelineCreateInfoKHR:

    VkStructureType                                      sType;
    const void*                                          pNext;
    VkPipelineCreateFlags                                flags;
    uint32_t                                             stageCount;
    const VkPipelineShaderStageCreateInfo*               pStages;
    uint32_t                                             groupCount;
    const VkRayTracingShaderGroupCreateInfoKHR*          pGroups;
    uint32_t                                             maxPipelineRayRecursionDepth;
    const VkPipelineLibraryCreateInfoKHR*                pLibraryInfo;
    const VkRayTracingPipelineInterfaceCreateInfoKHR*    pLibraryInterface;
    const VkPipelineDynamicStateCreateInfo*              pDynamicState;
    VkPipelineLayout                                     layout;
    VkPipeline                                           basePipelineHandle;
    int32_t                                              basePipelineIndex;
  • VkPipelineShaderStageCreateInfo will be created from input shaders
  • Pipeline Libraries are explained below
  • maxPipelineRayRecursionDepth should be validated (see the first section about Properties and Extension)

VkRayTracingShaderGroupCreateInfoKHR:

    VkStructureType                   sType;
    const void*                       pNext;
    VkRayTracingShaderGroupTypeKHR    type;
    uint32_t                          generalShader;
    uint32_t                          closestHitShader;
    uint32_t                          anyHitShader;
    uint32_t                          intersectionShader;
    const void*                       pShaderGroupCaptureReplayHandle;
  • generalShader can be miss shader, raygen shader and callable shader.

VkRayTracingShaderGroupTypeKHR:

    VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR = 0,
    VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR = 1,
    VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR = 2,

VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR indicates a shader group with a single VK_SHADER_STAGE_RAYGEN_BIT_KHR, VK_SHADER_STAGE_MISS_BIT_KHR, or VK_SHADER_STAGE_CALLABLE_BIT_KHR shader in it.

VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR specifies a shader group that only hits triangles and must not contain an intersection shader, only closest hit and any-hit shaders.

VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR specifies a shader group that only intersects with custom geometry and must contain an intersection shader

Pipeline Library

Should we add and handle VK_KHR_pipeline_library extension?

A pipeline library is a special pipeline that cannot be bound, instead it defines a set of shaders and shader groups which can be linked into other pipelines. This extension defines the infrastructure for pipeline libraries, but does not specify the creation or usage of pipeline libraries. This is left to additional dependent extensions.

VK_KHR_pipeline_library a soft requirement for VK_KHR_ray_tracing_pipeline instead of a strict requirement, so applications only need to enable it if they are actually using it.

Shader Binding Table

In order to build Buffer of Opaque ShaderGroupHandles (+ probable ShaderRecordData)

vkGetRayTracingShaderGroupHandlesKHR:

VkResult vkGetRayTracingShaderGroupHandlesKHR(
    VkDevice                                    device,
    VkPipeline                                  pipeline,
    uint32_t                                    firstGroup,
    uint32_t                                    groupCount,
    size_t                                      dataSize,
    void*                                       pData);

This is the only function needed (with no helper functions) to construct the ShaderBindingTable. shaderGroupHandleSize and shaderGroupBaseAlignment will be taken into consideration when constructing the SBT Buffer and computing offset for vkCmdTraceRaysKHR.

Also we could have a wrapper/helper class for SBT that does all the computation and construction of SBT Buffers for each ShaderGroupType (ragen, miss, hit, callable). And helps with the invocation of vkCmdTraceRaysKHR

Ray Tracing Pipeline Stack

Ray tracing pipelines have a potentially large set of shaders which may be invoked in various call chain combinations to perform ray tracing. To store parameters for a given shader execution, an implementation may use a stack of data in memory. This stack must be sized to the sum of the stack sizes of all shaders in any call chain executed by the application

For example, if an application has two types of closest hit and miss shaders that it can use but the first level of rays will only use the first kind (possibly reflection) and the second level will only use the second kind (occlusion or shadow ray, for example) then the application can compute the stack size by something similar to: rayGenStack + max(closestHit1Stack, miss1Stack) + max(closestHit2Stack, miss2Stack

In order to get/set Stack Sizes:

vkGetRayTracingShaderGroupStackSizeKHR:

VkDeviceSize vkGetRayTracingShaderGroupStackSizeKHR(
    VkDevice                                    device,
    VkPipeline                                  pipeline,
    uint32_t                                    group,
    VkShaderGroupShaderKHR                      groupShader);

vkCmdSetRayTracingPipelineStackSizeKHR:

void vkCmdSetRayTracingPipelineStackSizeKHR(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    pipelineStackSize);

VkShaderGroupShaderKHR is just an enum :

VkShaderGroupShaderKHR :

    VK_SHADER_GROUP_SHADER_GENERAL_KHR = 0,
    VK_SHADER_GROUP_SHADER_CLOSEST_HIT_KHR = 1,
    VK_SHADER_GROUP_SHADER_ANY_HIT_KHR = 2,
    VK_SHADER_GROUP_SHADER_INTERSECTION_KHR = 3,

RayTracing Commands

vkCmdTraceRaysKHR:

void vkCmdTraceRaysKHR(
    VkCommandBuffer                             commandBuffer,
    const VkStridedDeviceAddressRegionKHR*      pRaygenShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pMissShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pHitShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pCallableShaderBindingTable,
    uint32_t                                    width,
    uint32_t                                    height,
    uint32_t                                    depth);

VkStridedDeviceAddressRegionKHR:

typedef struct VkStridedDeviceAddressRegionKHR {
    VkDeviceAddress    deviceAddress;
    VkDeviceSize       stride;
    VkDeviceSize       size;
} VkStridedDeviceAddressRegionKHR;

Indirect Trace Rays

vkCmdTraceRaysIndirectKHR:

void vkCmdTraceRaysIndirectKHR(
    VkCommandBuffer                             commandBuffer,
    const VkStridedDeviceAddressRegionKHR*      pRaygenShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pMissShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pHitShaderBindingTable,
    const VkStridedDeviceAddressRegionKHR*      pCallableShaderBindingTable,
    VkDeviceAddress                             indirectDeviceAddress);
  • width/height/depth will be in the buffer indirectDeviceAddress points to.
  • indirectDeviceAddress is a buffer device address which is a pointer to a VkTraceRaysIndirectCommandKHR structure containing the trace ray parameters.

VkTraceRaysIndirectCommandKHR :

typedef struct VkTraceRaysIndirectCommandKHR {
    uint32_t    width;
    uint32_t    height;
    uint32_t    depth;
} VkTraceRaysIndirectCommandKHR;

Erfan-Ahmadi avatar Aug 19 '21 10:08 Erfan-Ahmadi

It has accelerationStructureHostCommands that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR)

Support is optional for all 5 (host) or all 10 (device and host).

VkDeviceOrHostAddressConstKHR

what decides which one this is?

what function I call? (Cmd vs no Cmd)

Support is optional for all 5 (host) or all 10 (device and host).

If you enable VK_KHR_acceleration_structure extension, It enables you to use the device functions. But in order to use the host functions you must enable accelerationStructureHostCommands feature. (after checking the physical device supports It)

VkSpec for vkCopyAccelerationStructureToMemoryKHR:

VUID-vkCopyAccelerationStructureToMemoryKHR-accelerationStructureHostCommands-03584 The VkPhysicalDeviceAccelerationStructureFeaturesKHR::accelerationStructureHostCommands feature must be enabled

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

weird but ok, kinda hard to extract "hard" dependencies (i.e. you just have at least one host or device thing supported)

IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and call vkGetBufferDeviceAddress behind the scenes.

So raytracing requires BDA?

VkDeviceOrHostAddressConstKHR

what decides which one this is?

what function I call? (Cmd vs no Cmd)

Yes, Vulkan takes VkDeviceOrHostAddressConstKHR for Infos like VkAccelerationStructureGeometryInstancesDataKHR or VkCopyMemoryToAccelerationStructureInfoKHR but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

Vulkan takes VkDeviceOrHostAddressConstKHR for Infos like VkAccelerationStructureGeometryInstancesDataKHR or VkCopyMemoryToAccelerationStructureInfoKHR but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.

sounds like a thing to solve with C++ templates template<typename address_type_t>

then IGPUCommandBuffer methods would use stuff with <const buffer_device_address_t> and ILogicalDevice methods would use <const void*>

So raytracing requires BDA?

Yes,

VK_KHR_ray_tracing_pipeline requires VK_KHR_acceleration_structure

and VK_KHR_acceleration_structure

Requires Vulkan 1.1 Requires VK_EXT_descriptor_indexing Requires VK_KHR_buffer_device_address Requires VK_KHR_deferred_host_operations

See https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_ray_tracing_pipeline.html

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

Do we have any guarantees on whether host commands or device commands will always be available?

from my reading it looks like host commands are optional, but device commands are always there

what queue do we need to dispatch the device commands?

Do we have any guarantees on whether host commands or device commands will always be available?

These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.

More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

from my reading it looks like host commands are optional, but device commands are always there

what queue do we need to dispatch the device commands?

Good question, Any queue that supports compute

• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations image

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

from my reading it looks like host commands are optional, but device commands are always there what queue do we need to dispatch the device commands?

Good question, Any queue that supports compute

• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations image

ok so its just like computing mip-maps, just do it on the compute queue.

Do we have any guarantees on whether host commands or device commands will always be available?

These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.

More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure

I think the cpu2gpu object converter should try and use the host methods to build the AS (massively parallel building BVHs produces them faster but they're lower quality)

The initial AS should be in HOST_CACHED non device local memory, and then host-copied and compacted to unmappable DEVICE_LOCAL (copy AS to AS).

ok so its just like computing mip-maps, just do it on the compute queue.

Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

ok so its just like computing mip-maps, just do it on the compute queue.

Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue

cpu2gpu converter already has this option/works this way

cpu2gpu converter already has this option/works this way

Understood

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

Because we are taking steps towards threading the cpu2gpu conversion and asset loading, we should expose Deferred operations.

Maybe ILogicalDevice could hand out core::smart_refctd_ptr<ILogicalDevice::IDeferredOperation> which are placement new allocated on a CMemoryPool like the one @achalpandeyy is using for commmandbuffers (lets not murder the heap)

Then IDeferredOperation could have join and get as methods (and a wait built on top of get which also forces at least one join), then its destructor and the refcounting could ensure that we dont vk-destroy and incomplete operation.

deviceAddress in these function parameters is related to accelerationStructureCaptureReplay and this optional functionality is intended to be used by tools and not by applications directly.

We'll definitely be using NSight a lot, and Renderdoc whenever it starts supporting raytracing. So we need this.

We will not support serializing Device & Driver Version dependent Acceleration Structures (we dont really support downloading compiled shaders back from the driver for faster loading either), any time soon....

So no need to worry about that.

So no need to worry about that.

I believe you're refering to the Comptability Check section?

Erfan-Ahmadi avatar Aug 19 '21 11:08 Erfan-Ahmadi

serialization and deserialization in general.

All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed

The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.

Second most important is the DXR no-anyhit-shader flag.

And backface triangle culling is actually more expensive being enabled in raytracing

There's also an important correctness (not perf) flag about whether anyhit shaders should only be called once per primitive.

The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.

Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.

See Vulkan Spec:

VUID-VkWriteDescriptorSetAccelerationStructureKHR-pAccelerationStructures-03579 Each acceleration structure in pAccelerationStructures must have been created with a type of VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR or VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR

You might wonder what VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR is. Vulkan Spec Also answers that in the issues section:

(5) What is VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR for? RESOLVED: It is primarily intended for API layering. In DXR, the acceleration structure is basically just a buffer in a special layout, and you don’t know at creation time whether it will be used as a top or bottom level acceleration structure. We thus added a generic acceleration structure type whose type is unknown at creation time, but is specified at build time instead. Applications which are written directly for Vulkan should not use it

Erfan-Ahmadi avatar Aug 19 '21 12:08 Erfan-Ahmadi

All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed

I agree.

Erfan-Ahmadi avatar Aug 19 '21 12:08 Erfan-Ahmadi

These should be the defaults for cpu2gpu conversion and anything else that doesnt get overriden by explicit user choice

VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR // if feature present, otherwise device only

If there's a sign that the geometry could be animated (such as a meshbuffer having boneor animation info), use these instead

VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR 
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR 
// later on
VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV // if VK_NV_ray_tracing_motion_blur present

add VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR if you detect Nsight.

Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.

Potatoe, potato

I presume there's an option to create a TLAS without any BLASes?

I presume there's an option to create a TLAS without any BLASes?

Unfortunately I don't think so.

• VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03789 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, the geometryType member of elements of either pGeometries or ppGeometries must be VK_GEOMETRY_TYPE_INSTANCES_KHR • VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03790 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, geometryCount must be 1

The geometry type must be VK_GEOMETRY_TYPE_INSTANCES_KHR which is instances of other Acceleration Structures

https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkAccelerationStructureInstanceKHR.html

Erfan-Ahmadi avatar Aug 19 '21 12:08 Erfan-Ahmadi

hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.

So what do we do then, TLAS with a single instance? No better way to do it?

hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.

So what do we do then, TLAS with a single instance? No better way to do it?

The simplest case would be 1 BLAS and 1 TLAS with 1 instance refering to the BLAS.

Other than Vulkan Spec you could also see the nvpro_samples which provide a good vision on how one must work with these structs and functions: https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR It uses it all I believe in the projects.

Erfan-Ahmadi avatar Aug 19 '21 12:08 Erfan-Ahmadi