Nabla
Nabla copied to clipboard
Expose Raytracing Pipeline
This is a documentation of the vulkan objects that are part of VK_KHR_ray_tracing_pipeline
we may want to expose and work with in Nabla.
- Click on the links if I don't explain something enough.
- Please pay attention to the verbs used: AS Build and AS Creation are different things.
- vkCmd* is related to device operations while vk* is related to host operations (eg.
vkCmdBuildAccelerationStructuresKHR
andvkBuildAccelerationStructuresKHR
)
Extension and Properties
VkPhysicalDeviceRayTracingPipelinePropertiesKHR
structure is included in the pNext chain of the VkPhysicalDeviceProperties2 structure passed to vkGetPhysicalDeviceProperties2, it is filled in with each corresponding implementation-dependent property.
VkPhysicalDeviceRayTracingPipelineFeaturesKHR
:
structure is included in the pNext chain of the VkPhysicalDeviceFeatures2 structure passed to vkGetPhysicalDeviceFeatures2
These are needed for:
- ShaderGroup opaque handle's memory management (e.g
shaderGroupHandleSize
shaderGroupBaseAlignment
,maxShaderGroupStride
,shaderGroupHandleAlignment
) - Some value's max/min to validate (e.g
maxRayHitAttributeSize
maxRayRecursionDepth
maxRayDispatchInvocationCount
)
We must also expose some of these physical device values for the use to work with; If user wants to do everything low-level and manually (eg creating ShaderBindingTable Buffer)
VkPhysicalDeviceAccelerationStructureFeaturesKHR
:
It has accelerationStructureHostCommands
that indicates whether the implementation supports host side acceleration structure commands:
(vkBuildAccelerationStructuresKHR
, vkCopyAccelerationStructureKHR
, vkCopyAccelerationStructureToMemoryKHR
, vkCopyMemoryToAccelerationStructureKHR
, and vkWriteAccelerationStructuresPropertiesKHR
)
Just put Cmd
after vk
to make the functions above device
functions instead of host ones.
Acceleration Structures (Creation, Build, Compaction, Copy, Related Enums and Structs...)
Geometry:
VkAccelerationStructureGeometryKHR
:
typedef struct VkAccelerationStructureGeometryKHR {
VkStructureType sType;
const void* pNext;
VkGeometryTypeKHR geometryType;
VkAccelerationStructureGeometryDataKHR geometry;
VkGeometryFlagsKHR flags;
} VkAccelerationStructureGeometryKHR;
geometry member is a union of three other structs:
-
Triangles
specifies a geometry type consisting of triangles (used when building blas from vertex buffer) -
AABBs
geometry type consisting of axis-aligned bounding boxes. (used when working with custom primitives that need a custom Intersection shader) -
Instances
a geometry type consisting of acceleration structure instances. (used when building tlas frmo blas instances)
1. VkAccelerationStructureGeometryTrianglesDataKHR
:
VkStructureType sType;
const void* pNext;
VkFormat vertexFormat;
VkDeviceOrHostAddressConstKHR vertexData;
VkDeviceSize vertexStride;
uint32_t maxVertex;
VkIndexType indexType;
VkDeviceOrHostAddressConstKHR indexData;
VkDeviceOrHostAddressConstKHR transformData;
2. VkAccelerationStructureGeometryAabbsDataKHR
:
VkStructureType sType;
const void* pNext;
VkDeviceOrHostAddressConstKHR data;
VkDeviceSize stride;
- data is a device or host address to memory containing VkAabbPositionsKHR structures containing position data for each axis-aligned bounding box in the geometry.
VkAabbPositionsKHR
:
float minX;
float minY;
float minZ;
float maxX;
float maxY;
float maxZ;
3. VkAccelerationStructureGeometryInstancesDataKHR
:
VkStructureType sType;
const void* pNext;
VkBool32 arrayOfPointers;
VkDeviceOrHostAddressConstKHR data;
- data is either the address of an array of device or host addresses referencing individual
VkAccelerationStructureInstanceKHR
structures or packed motion instance information as described in motion instances if arrayOfPointers is VK_TRUE, or the address of an array of VkAccelerationStructureInstanceKHR or VkAccelerationStructureMotionInstanceNV structures. Addresses and VkAccelerationStructureInstanceKHR structures are tightly packed.VkAccelerationStructureMotionInstanceNV
have a stride of 160 bytes.
VkAccelerationStructureInstanceKHR
:
typedef struct VkAccelerationStructureInstanceKHR {
VkTransformMatrixKHR transform;
uint32_t instanceCustomIndex:24;
uint32_t mask:8;
uint32_t instanceShaderBindingTableRecordOffset:24;
VkGeometryInstanceFlagsKHR flags:8;
uint64_t accelerationStructureReference;
} VkAccelerationStructureInstanceKHR;
accelerationStructureReference is either:
- a device address containing the value obtained from vkGetAccelerationStructureDeviceAddressKHR or vkGetAccelerationStructureHandleNV (used by device operations which reference acceleration structures) or,
- a
VkAccelerationStructureKHR
object (used by host operations which reference acceleration structures).
VkGeometryTypeKHR
:
VK_GEOMETRY_TYPE_TRIANGLES_KHR = 0,
VK_GEOMETRY_TYPE_AABBS_KHR = 1,
VK_GEOMETRY_TYPE_INSTANCES_KHR = 2,
VkGeometryFlagBitsKHR
:
VK_GEOMETRY_OPAQUE_BIT_KHR = 0x00000001,
VK_GEOMETRY_NO_DUPLICATE_ANY_HIT_INVOCATION_BIT_KHR = 0x00000002,
Acceleration Structures are built ~~created~~ from:
- One or more geometries (
VkAccelerationStructureGeometryKHR
) filled in aVkAccelerationStructureBuildGeometryInfoKHR
(referenced later in this text) - And for each geometry we should have a build range (
VkAccelerationStructureBuildRangeInfoKHR
)
VkAccelerationStructureBuildRangeInfoKHR
:
uint32_t primitiveCount;
uint32_t primitiveOffset;
uint32_t firstVertex;
uint32_t transformOffset;
- The relation between
geometry.triangles
andBuildRangeInfo
is similar to the relation betweenvertexBuffer+inputAttributes
and parameters ofvkCmdDraw
In the case of triangle geometry, primitiveCount
is the number of triangles.
VkAccelerationStructureBuildGeometryInfoKHR
:
VkStructureType sType;
const void* pNext;
VkAccelerationStructureTypeKHR type;
VkBuildAccelerationStructureFlagsKHR flags;
VkBuildAccelerationStructureModeKHR mode;
VkAccelerationStructureKHR srcAccelerationStructure;
VkAccelerationStructureKHR dstAccelerationStructure;
uint32_t geometryCount;
const VkAccelerationStructureGeometryKHR* pGeometries;
const VkAccelerationStructureGeometryKHR* const* ppGeometries;
VkDeviceOrHostAddressKHR scratchData;
Most of the members are clear enough and explained in the spec, there is only a few notes:
- scratchData is the temprory scratchData for Vulkan to work with when It's building the AS.
- ScratchData Size is queried using
vkGetAccelerationStructureBuildSizesKHR
- The enums used are referenced below.
vkGetAccelerationStructureBuildSizesKHR
:
void vkGetAccelerationStructureBuildSizesKHR(
VkDevice device,
VkAccelerationStructureBuildTypeKHR buildType,
const VkAccelerationStructureBuildGeometryInfoKHR* pBuildInfo,
const uint32_t* pMaxPrimitiveCounts,
VkAccelerationStructureBuildSizesInfoKHR* pSizeInfo);
- pBuildInfo might be partially or fully filled
- pMaxPrimitiveCounts is the maximum number of primitives for each geometry in the pBuildInfos.pp
- You might want to set pMaxPrimitiveCounts of each geometry the exact same value set for their respective
VkAccelerationStructureBuildRangeInfoKHR
's primitiveCount
VkAccelerationStructureBuildSizesInfoKHR
:
VkStructureType sType;
const void* pNext;
VkDeviceSize accelerationStructureSize;
VkDeviceSize updateScratchSize;
VkDeviceSize buildScratchSize;
We should usually call this function before Creating our AS becuase sizeInfo contains sizeInfo.accelerationStructureSize
We should usually call this function before Building our AS becuase sizeInfo contains sizeInfo.buildScratchSize
We should usually call this function before Updating our AS becuase sizeInfo contains sizeInfo.updateScratchSize
After Querying the sizes using vkGetAccelerationStructureBuildSizesKHR
we must:
- Create the scratch buffer with ( size = sizeInfo.buildScratchSize)
- Continue to fill our buildInfo:
buildInfo.scratchData.deviceAddress = scratchAddress;
IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress
?
Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly.
I think we can also take our Nabla objects and call vkGetBufferDeviceAddress
behind the scenes.
Host/Device Commands for Build, CopyAStoAS, CopyASToMemory, CopyMemoryToAS, WriteProperties
Note for Host Commands:
- They use Deferred Operations so they could be "joined" in users thread for more CPU utilization, We could expose these Deferred Operations (VkDeferredOperationKHR) Creation and Join function or we could simply use that internally and call join and wait for VK_SUCCESS and finish the task in the appropriate function in the current calling thread without exposing Deferred Operations.
Build AS
All function parameters are explained above
vkCmdBuildAccelerationStructuresKHR
:
void vkCmdBuildAccelerationStructuresKHR(
VkCommandBuffer commandBuffer,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);
- Read Vulkan Spec for Correct PipelineStages and AccessTypes for Memory Barriers of the Memory Involved in Build Command
vkCmdBuildAccelerationStructuresIndirectKHR
:
void vkCmdBuildAccelerationStructuresIndirectKHR(
VkCommandBuffer commandBuffer,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkDeviceAddress* pIndirectDeviceAddresses,
const uint32_t* pIndirectStrides,
const uint32_t* const* ppMaxPrimitiveCounts);
-
pIndirectDeviceAddresses is a pointer to an array of infoCount buffer device addresses which point to pInfos[i].geometryCount
VkAccelerationStructureBuildRangeInfoKHR
structures defining dynamic offsets to the addresses where geometry data is stored, as defined by pInfos[i]. -
Accesses to any element of pIndirectDeviceAddresses must be synchronized with the VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR pipeline stage and an access type of VK_ACCESS_INDIRECT_COMMAND_READ_BIT.
vkBuildAccelerationStructuresKHR
:
VkResult vkBuildAccelerationStructuresKHR(
VkDevice device,
VkDeferredOperationKHR deferredOperation,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);
Write Properties
vkCmdWriteAccelerationStructuresPropertiesKHR
:
void vkCmdWriteAccelerationStructuresPropertiesKHR(
VkCommandBuffer commandBuffer,
uint32_t accelerationStructureCount,
const VkAccelerationStructureKHR* pAccelerationStructures,
VkQueryType queryType,
VkQueryPool queryPool,
uint32_t firstQuery);
Note for write properties:
- We could expose a QueryType and QueryPool interface to the user but that would be very Vulkan Specific.
- We can also have functions for each QueryType and store QueryPool somewhere internal without exposing the interface to user.
I suggest we go with 2. for example :
QueryAccelerationStructuresCompactionSizes(....)
There is no need for QueryPool for the respective Host Operation
vkWriteAccelerationStructuresPropertiesKHR
:
VkResult vkWriteAccelerationStructuresPropertiesKHR(
VkDevice device,
uint32_t accelerationStructureCount,
const VkAccelerationStructureKHR* pAccelerationStructures,
VkQueryType queryType,
size_t dataSize,
void* pData,
size_t stride);
Copy AS to AS
Example usage is when copying AS to CompactedAS
vkCmdCopyAccelerationStructureKHR
:
void vkCmdCopyAccelerationStructureKHR(
VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureInfoKHR* pInfo);
vkCopyAccelerationStructureKHR
:
VkResult vkCopyAccelerationStructureKHR(
VkDevice device,
VkDeferredOperationKHR deferredOperation,
const VkCopyAccelerationStructureInfoKHR* pInfo);
VkCopyAccelerationStructureInfoKHR
:
VkStructureType sType;
const void* pNext;
VkAccelerationStructureKHR src;
VkAccelerationStructureKHR dst;
VkCopyAccelerationStructureModeKHR mode;
Important note for memory barriers
- Accesses to pInfo->src and pInfo->dst must be synchronized with the VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_KHR pipeline stage and an access type of VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR or VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_KHR as appropriate.
VkCopyAccelerationStructureModeKHR
:
VK_COPY_ACCELERATION_STRUCTURE_MODE_CLONE_KHR = 0,
VK_COPY_ACCELERATION_STRUCTURE_MODE_COMPACT_KHR = 1,
VK_COPY_ACCELERATION_STRUCTURE_MODE_SERIALIZE_KHR = 2,
VK_COPY_ACCELERATION_STRUCTURE_MODE_DESERIALIZE_KHR = 3,
Copy AS To Memory
vkCmdCopyAccelerationStructureToMemoryKHR
:
void vkCmdCopyAccelerationStructureToMemoryKHR(
VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);
vkCopyAccelerationStructureToMemoryKHR
:
VkResult vkCopyAccelerationStructureToMemoryKHR(
VkDevice device,
VkDeferredOperationKHR deferredOperation,
const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);
VkCopyAccelerationStructureToMemoryInfoKHR
:
VkStructureType sType;
const void* pNext;
VkAccelerationStructureKHR src;
VkDeviceOrHostAddressKHR dst;
VkCopyAccelerationStructureModeKHR mode;
Copy Memory To AS
vkCopyMemoryToAccelerationStructureKHR
:
VkResult vkCopyMemoryToAccelerationStructureKHR(
VkDevice device,
VkDeferredOperationKHR deferredOperation,
const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);
vkCmdCopyMemoryToAccelerationStructureKHR
:
void vkCmdCopyMemoryToAccelerationStructureKHR(
VkCommandBuffer commandBuffer,
const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);
VkCopyMemoryToAccelerationStructureInfoKHR
:
VkStructureType sType;
const void* pNext;
VkDeviceOrHostAddressConstKHR src;
VkAccelerationStructureKHR dst;
VkCopyAccelerationStructureModeKHR mode;
Comptability Check
To check if a serialized acceleration structure is compatible with the current device call. We need a function to use these functions and structs for the compatibility Check.
-
vkGetDeviceAccelerationStructureCompatibilityKHR
-
VkAccelerationStructureCompatibilityKHR
-
VkAccelerationStructureVersionInfoKHR
Creating AS
VkAccelerationStructureCreateInfoKHR
:
VkStructureType sType;
const void* pNext;
VkAccelerationStructureCreateFlagsKHR createFlags;
VkBuffer buffer;
VkDeviceSize offset;
VkDeviceSize size;
VkAccelerationStructureTypeKHR type;
VkDeviceAddress deviceAddress;
-
deviceAddress in these function parameters is related to
accelerationStructureCaptureReplay
and this optional functionality is intended to be used by tools and not by applications directly. -
createInfo.buffer is a buffer allocated most likely with size of
sizeInfo.accelerationStructureSize
(SeeVkAccelerationStructureBuildSizesInfoKHR
above)
Enums Used
VkAccelerationStructureTypeKHR
:
VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR = 0,
VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_KHR = 1,
VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR = 2,
VkDeviceOrHostAddressConstKHR
:
typedef union VkDeviceOrHostAddressConstKHR {
VkDeviceAddress deviceAddress;
const void* hostAddress;
} VkDeviceOrHostAddressConstKHR;
Fill hostAddress
when working with host side acceleration structure and fill in deviceAddress
otherwise.
Exposing this is a matter of choice, function could also take different inputs that might not need DeviceOrHostAddressConst
(also has a non-const version)
I suggest exposing it as an struct.
VkBuildAccelerationStructureModeKHR
:
VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR = 0,
VK_BUILD_ACCELERATION_STRUCTURE_MODE_UPDATE_KHR = 1,
We also could write different build/update AS functions.
VkBuildAccelerationStructureFlagBitsKHR
:
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR = 0x00000001,
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR = 0x00000002,
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR = 0x00000004,
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR = 0x00000008,
VK_BUILD_ACCELERATION_STRUCTURE_LOW_MEMORY_BIT_KHR = 0x00000010,
VkAccelerationStructureBuildTypeKHR
:
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR = 0,
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR = 1,
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_OR_DEVICE_KHR = 2,
VkAccelerationStructureCreateFlagBitsKHR
:
VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR = 0x00000001,
// Provided by VK_NV_ray_tracing_motion_blur
VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV = 0x00000004,
VkGeometryInstanceFlagBitsKHR
:
VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR = 0x00000001,
VK_GEOMETRY_INSTANCE_TRIANGLE_FLIP_FACING_BIT_KHR = 0x00000002,
VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR = 0x00000004,
VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR = 0x00000008,
VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_KHR = VK_GEOMETRY_INSTANCE_TRIANGLE_FLIP_FACING_BIT_KHR,
// Provided by VK_NV_ray_tracing
VK_GEOMETRY_INSTANCE_TRIANGLE_CULL_DISABLE_BIT_NV = VK_GEOMETRY_INSTANCE_TRIANGLE_FACING_CULL_DISABLE_BIT_KHR,
// Provided by VK_NV_ray_tracing
VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_NV = VK_GEOMETRY_INSTANCE_TRIANGLE_FRONT_COUNTERCLOCKWISE_BIT_KHR,
// Provided by VK_NV_ray_tracing
VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_NV = VK_GEOMETRY_INSTANCE_FORCE_OPAQUE_BIT_KHR,
// Provided by VK_NV_ray_tracing
VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_NV = VK_GEOMETRY_INSTANCE_FORCE_NO_OPAQUE_BIT_KHR,
Deferred Operations
(Fill if needed to expose)
RayTracing Pipeline
vkCreateRayTracingPipelinesKHR
:
VkResult vkCreateRayTracingPipelinesKHR(
VkDevice device,
VkDeferredOperationKHR deferredOperation,
VkPipelineCache pipelineCache,
uint32_t createInfoCount,
const VkRayTracingPipelineCreateInfoKHR* pCreateInfos,
const VkAllocationCallbacks* pAllocator,
VkPipeline* pPipelines);
- Note: Also takes deferredOperation
VkRayTracingPipelineCreateInfoKHR
:
VkStructureType sType;
const void* pNext;
VkPipelineCreateFlags flags;
uint32_t stageCount;
const VkPipelineShaderStageCreateInfo* pStages;
uint32_t groupCount;
const VkRayTracingShaderGroupCreateInfoKHR* pGroups;
uint32_t maxPipelineRayRecursionDepth;
const VkPipelineLibraryCreateInfoKHR* pLibraryInfo;
const VkRayTracingPipelineInterfaceCreateInfoKHR* pLibraryInterface;
const VkPipelineDynamicStateCreateInfo* pDynamicState;
VkPipelineLayout layout;
VkPipeline basePipelineHandle;
int32_t basePipelineIndex;
- VkPipelineShaderStageCreateInfo will be created from input shaders
- Pipeline Libraries are explained below
- maxPipelineRayRecursionDepth should be validated (see the first section about Properties and Extension)
VkRayTracingShaderGroupCreateInfoKHR
:
VkStructureType sType;
const void* pNext;
VkRayTracingShaderGroupTypeKHR type;
uint32_t generalShader;
uint32_t closestHitShader;
uint32_t anyHitShader;
uint32_t intersectionShader;
const void* pShaderGroupCaptureReplayHandle;
- generalShader can be miss shader, raygen shader and callable shader.
VkRayTracingShaderGroupTypeKHR
:
VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR = 0,
VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR = 1,
VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR = 2,
VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_KHR indicates a shader group with a single VK_SHADER_STAGE_RAYGEN_BIT_KHR, VK_SHADER_STAGE_MISS_BIT_KHR, or VK_SHADER_STAGE_CALLABLE_BIT_KHR shader in it.
VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_KHR specifies a shader group that only hits triangles and must not contain an intersection shader, only closest hit and any-hit shaders.
VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR specifies a shader group that only intersects with custom geometry and must contain an intersection shader
Pipeline Library
Should we add and handle VK_KHR_pipeline_library extension?
A pipeline library is a special pipeline that cannot be bound, instead it defines a set of shaders and shader groups which can be linked into other pipelines. This extension defines the infrastructure for pipeline libraries, but does not specify the creation or usage of pipeline libraries. This is left to additional dependent extensions.
VK_KHR_pipeline_library a soft requirement for VK_KHR_ray_tracing_pipeline instead of a strict requirement, so applications only need to enable it if they are actually using it.
Shader Binding Table
In order to build Buffer of Opaque ShaderGroupHandles (+ probable ShaderRecordData)
vkGetRayTracingShaderGroupHandlesKHR
:
VkResult vkGetRayTracingShaderGroupHandlesKHR(
VkDevice device,
VkPipeline pipeline,
uint32_t firstGroup,
uint32_t groupCount,
size_t dataSize,
void* pData);
This is the only function needed (with no helper functions) to construct the ShaderBindingTable.
shaderGroupHandleSize
and shaderGroupBaseAlignment
will be taken into consideration when constructing the SBT Buffer and computing offset for vkCmdTraceRaysKHR
.
Also we could have a wrapper/helper class for SBT that does all the computation and construction of SBT Buffers for each ShaderGroupType (ragen, miss, hit, callable).
And helps with the invocation of vkCmdTraceRaysKHR
Ray Tracing Pipeline Stack
Ray tracing pipelines have a potentially large set of shaders which may be invoked in various call chain combinations to perform ray tracing. To store parameters for a given shader execution, an implementation may use a stack of data in memory. This stack must be sized to the sum of the stack sizes of all shaders in any call chain executed by the application
For example, if an application has two types of closest hit and miss shaders that it can use but the first level of rays will only use the first kind (possibly reflection) and the second level will only use the second kind (occlusion or shadow ray, for example) then the application can compute the stack size by something similar to: rayGenStack + max(closestHit1Stack, miss1Stack) + max(closestHit2Stack, miss2Stack
In order to get/set Stack Sizes:
vkGetRayTracingShaderGroupStackSizeKHR
:
VkDeviceSize vkGetRayTracingShaderGroupStackSizeKHR(
VkDevice device,
VkPipeline pipeline,
uint32_t group,
VkShaderGroupShaderKHR groupShader);
vkCmdSetRayTracingPipelineStackSizeKHR
:
void vkCmdSetRayTracingPipelineStackSizeKHR(
VkCommandBuffer commandBuffer,
uint32_t pipelineStackSize);
VkShaderGroupShaderKHR
is just an enum :
VkShaderGroupShaderKHR
:
VK_SHADER_GROUP_SHADER_GENERAL_KHR = 0,
VK_SHADER_GROUP_SHADER_CLOSEST_HIT_KHR = 1,
VK_SHADER_GROUP_SHADER_ANY_HIT_KHR = 2,
VK_SHADER_GROUP_SHADER_INTERSECTION_KHR = 3,
RayTracing Commands
vkCmdTraceRaysKHR
:
void vkCmdTraceRaysKHR(
VkCommandBuffer commandBuffer,
const VkStridedDeviceAddressRegionKHR* pRaygenShaderBindingTable,
const VkStridedDeviceAddressRegionKHR* pMissShaderBindingTable,
const VkStridedDeviceAddressRegionKHR* pHitShaderBindingTable,
const VkStridedDeviceAddressRegionKHR* pCallableShaderBindingTable,
uint32_t width,
uint32_t height,
uint32_t depth);
VkStridedDeviceAddressRegionKHR
:
typedef struct VkStridedDeviceAddressRegionKHR {
VkDeviceAddress deviceAddress;
VkDeviceSize stride;
VkDeviceSize size;
} VkStridedDeviceAddressRegionKHR;
Indirect Trace Rays
vkCmdTraceRaysIndirectKHR
:
void vkCmdTraceRaysIndirectKHR(
VkCommandBuffer commandBuffer,
const VkStridedDeviceAddressRegionKHR* pRaygenShaderBindingTable,
const VkStridedDeviceAddressRegionKHR* pMissShaderBindingTable,
const VkStridedDeviceAddressRegionKHR* pHitShaderBindingTable,
const VkStridedDeviceAddressRegionKHR* pCallableShaderBindingTable,
VkDeviceAddress indirectDeviceAddress);
- width/height/depth will be in the buffer
indirectDeviceAddress
points to. - indirectDeviceAddress is a buffer device address which is a pointer to a
VkTraceRaysIndirectCommandKHR
structure containing the trace ray parameters.
VkTraceRaysIndirectCommandKHR
:
typedef struct VkTraceRaysIndirectCommandKHR {
uint32_t width;
uint32_t height;
uint32_t depth;
} VkTraceRaysIndirectCommandKHR;
It has accelerationStructureHostCommands that indicates whether the implementation supports host side acceleration structure commands: (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR)
Support is optional for all 5 (host) or all 10 (device and host).
VkDeviceOrHostAddressConstKHR
what decides which one this is?
what function I call? (Cmd vs no Cmd)
Support is optional for all 5 (host) or all 10 (device and host).
If you enable VK_KHR_acceleration_structure
extension, It enables you to use the device functions.
But in order to use the host functions you must enable accelerationStructureHostCommands
feature. (after checking the physical device supports It)
VkSpec for vkCopyAccelerationStructureToMemoryKHR
:
VUID-vkCopyAccelerationStructureToMemoryKHR-accelerationStructureHostCommands-03584 The VkPhysicalDeviceAccelerationStructureFeaturesKHR::accelerationStructureHostCommands feature must be enabled
weird but ok, kinda hard to extract "hard" dependencies (i.e. you just have at least one host or device thing supported)
IMPORTANT NOTE @devshgraphicsprogramming: Do we want to expose a function for vkGetBufferDeviceAddress? Since most of the functions and struct related to this raytracing extension works with deviceAddresses and not buffers and AS's directly. I think we can also take our Nabla objects and call vkGetBufferDeviceAddress behind the scenes.
So raytracing requires BDA?
VkDeviceOrHostAddressConstKHR
what decides which one this is?
what function I call? (Cmd vs no Cmd)
Yes,
Vulkan takes VkDeviceOrHostAddressConstKHR
for Infos like VkAccelerationStructureGeometryInstancesDataKHR
or VkCopyMemoryToAccelerationStructureInfoKHR
but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.
Vulkan takes VkDeviceOrHostAddressConstKHR for Infos like VkAccelerationStructureGeometryInstancesDataKHR or VkCopyMemoryToAccelerationStructureInfoKHR but if you're using the host function the hostAddress must be filled and if you're using device functions (with Cmd), the deviceAddress must be filled.
sounds like a thing to solve with C++ templates
template<typename address_type_t>
then IGPUCommandBuffer
methods would use stuff with <const buffer_device_address_t>
and ILogicalDevice
methods would use <const void*>
So raytracing requires BDA?
Yes,
VK_KHR_ray_tracing_pipeline
requires VK_KHR_acceleration_structure
and VK_KHR_acceleration_structure
Requires Vulkan 1.1 Requires VK_EXT_descriptor_indexing Requires VK_KHR_buffer_device_address Requires VK_KHR_deferred_host_operations
See https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_ray_tracing_pipeline.html
Do we have any guarantees on whether host commands or device commands will always be available?
from my reading it looks like host commands are optional, but device commands are always there
what queue do we need to dispatch the device commands?
Do we have any guarantees on whether host commands or device commands will always be available?
These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.
More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure
from my reading it looks like host commands are optional, but device commands are always there
what queue do we need to dispatch the device commands?
Good question, Any queue that supports compute
• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations
from my reading it looks like host commands are optional, but device commands are always there what queue do we need to dispatch the device commands?
Good question, Any queue that supports compute
• VUID-vkCmdBuildAccelerationStructuresKHR-commandBuffer-cmdpool The VkCommandPool that commandBuffer was allocated from must support compute operations
ok so its just like computing mip-maps, just do it on the compute queue.
Do we have any guarantees on whether host commands or device commands will always be available?
These CPU-based commands are optional, but the device versions of these commands (vkCmd*) are always supported.
More info here: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#host-acceleration-structure
I think the cpu2gpu object converter should try and use the host methods to build the AS (massively parallel building BVHs produces them faster but they're lower quality)
The initial AS should be in HOST_CACHED non device local memory, and then host-copied and compacted to unmappable DEVICE_LOCAL (copy AS to AS).
ok so its just like computing mip-maps, just do it on the compute queue.
Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue
ok so its just like computing mip-maps, just do it on the compute queue.
Shouldn't user allocate the command buffer from compute queue and give it to the functions I provided as a parameter? I think I only can validate if the cmdBuffer is from a supported (compute) queue
cpu2gpu converter already has this option/works this way
cpu2gpu converter already has this option/works this way
Understood
Because we are taking steps towards threading the cpu2gpu conversion and asset loading, we should expose Deferred operations.
Maybe ILogicalDevice could hand out core::smart_refctd_ptr<ILogicalDevice::IDeferredOperation>
which are placement new allocated on a CMemoryPool like the one @achalpandeyy is using for commmandbuffers (lets not murder the heap)
Then IDeferredOperation
could have join
and get
as methods (and a wait
built on top of get
which also forces at least one join
), then its destructor and the refcounting could ensure that we dont vk-destroy and incomplete operation.
deviceAddress in these function parameters is related to accelerationStructureCaptureReplay and this optional functionality is intended to be used by tools and not by applications directly.
We'll definitely be using NSight a lot, and Renderdoc whenever it starts supporting raytracing. So we need this.
We will not support serializing Device & Driver Version dependent Acceleration Structures (we dont really support downloading compiled shaders back from the driver for faster loading either), any time soon....
So no need to worry about that.
So no need to worry about that.
I believe you're refering to the Comptability Check
section?
serialization and deserialization in general.
All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed
The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.
Second most important is the DXR no-anyhit-shader flag.
And backface triangle culling is actually more expensive being enabled in raytracing
There's also an important correctness (not perf) flag about whether anyhit shaders should only be called once per primitive.
The most important things for performance are the abilitity to build a single unified AS (no TLAS, everything is one BLAS) with no/little instancing.
Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.
See Vulkan Spec:
VUID-VkWriteDescriptorSetAccelerationStructureKHR-pAccelerationStructures-03579 Each acceleration structure in pAccelerationStructures must have been created with a type of VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR or VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR
You might wonder what VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR
is.
Vulkan Spec Also answers that in the issues section:
(5) What is VK_ACCELERATION_STRUCTURE_TYPE_GENERIC_KHR for? RESOLVED: It is primarily intended for API layering. In DXR, the acceleration structure is basically just a buffer in a special layout, and you don’t know at creation time whether it will be used as a top or bottom level acceleration structure. We thus added a generic acceleration structure type whose type is unknown at creation time, but is specified at build time instead. Applications which are written directly for Vulkan should not use it
All the Acceleration Structure flags are REALLY IMPORTANT and should be exposed
I agree.
These should be the defaults for cpu2gpu conversion and anything else that doesnt get overriden by explicit user choice
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_COMPACTION_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR // if feature present, otherwise device only
If there's a sign that the geometry could be animated (such as a meshbuffer having boneor animation info), use these instead
VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR
VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR
// later on
VK_ACCELERATION_STRUCTURE_CREATE_MOTION_BIT_NV // if VK_NV_ray_tracing_motion_blur present
add VK_ACCELERATION_STRUCTURE_CREATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT_KHR
if you detect Nsight.
Note that in Vulkan's Perspective you cannot bind BLAS directly as a descriptor, you should always create a TLAS.
Potatoe, potato
I presume there's an option to create a TLAS without any BLASes?
I presume there's an option to create a TLAS without any BLASes?
Unfortunately I don't think so.
• VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03789 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, the geometryType member of elements of either pGeometries or ppGeometries must be VK_GEOMETRY_TYPE_INSTANCES_KHR • VUID-VkAccelerationStructureBuildGeometryInfoKHR-type-03790 If type is VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_KHR, geometryCount must be 1
The geometry type must be VK_GEOMETRY_TYPE_INSTANCES_KHR
which is instances of other Acceleration Structures
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkAccelerationStructureInstanceKHR.html
hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.
So what do we do then, TLAS with a single instance? No better way to do it?
hmm but I've seen and heard of people tracing just the BLAS for maximum gains in static scenes in DXR/OptiX.
So what do we do then, TLAS with a single instance? No better way to do it?
The simplest case would be 1 BLAS and 1 TLAS with 1 instance refering to the BLAS.
Other than Vulkan Spec you could also see the nvpro_samples
which provide a good vision on how one must work with these structs and functions: https://github.com/nvpro-samples/vk_raytracing_tutorial_KHR
It uses it all I believe in the projects.