Nabla icon indicating copy to clipboard operation
Nabla copied to clipboard

Work on property pool HLSL impl

Open deprilula28 opened this issue 1 year ago • 0 comments

Description

Implementing CPropertyPoolHandler and CPropertyPool in HLSL, using direct buffer address instead of allocating descriptors sets for buffers. Notes about impl:

-> Currently uses descritor pools (needs to allocate every time)
    -> Use BDA and root constants with the addresses instead
-> Device capabilities traits 
    -> Example version: https://github.com/Devsh-Graphics-Programming/Nabla/blob/vulkan_1_3/include/nbl/builtin/hlsl/device_capabilities_traits.hlsl
    -> maxOptimallyResidentWorkgroupInvocations
    -> Can use nbl::hlsl::jit::device_capabilities struct with JIT generated "constexpr" variables for maximally optimal workgroup invocations
    https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/master/23_ArithmeticUnitTest/app_resources/shaderCommon.hlsl#L9
    https://github.com/microsoft/DirectXShaderCompiler/issues/6144


=== tasks ===

-> Port https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/include/nbl/builtin/glsl/property_pool/copy.comp to HLSL
    -> Persitently resident threads, scrolling over, maximally using the GPU workgroup size
    -> Dispatch: 2D (x: DWORD in the property ID, y: property ID)
        -> property id: which buffer youre touching/analogous to draw id
            -> indexes into transferData
            -> new version: use null pointer as invalid pointer
    -> transferData: List of copy "commands"
        -> new version: Replaced by push constant with BDA address
    -> addresses: "Index buffer"
        -> invalid pointer: IOTA (analogous to not using an index buffer, use iteration index as the fetching index)
    -> Use shorts (uint16) instead of DWORDs (uint32)
        -> Transfer data struct uses bytes for future proofing
    -> Specialize on:
        -> Whether or not source is a fill
        -> Type of index (uint8, uint16, uint32, uint64)
        -> Src index is IOTA
        -> Dst index is IOTA
    -> Keep optimization for modulos (line 38 & 52)

-> CPU Code
    -> CPropertyPoolHandler
        -> Nuke m_maxPropertiesPerPass, getMaxScratchSize (not relevant with BDA version)
    -> TransferRequest on CPU keeps reference to the buffer and places it in the command buffer for lifetime tracking
        -> Have a custom command that just keeps track of a **variable number** of reference counted objects for preserving lifetimes (LinkedPreservedLifetimes?)
            -> Take a span of IGPUReferenceCounted
            -> Example: https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/IGPUCommandBuffer.cpp#L104C54-L104C54
            -> For variable amount of stuff: https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/IGPUCommandBuffer.cpp#L403C90-L403C90
            -> Signature example: `IGPUCommandBuffer::preserveLifetime(std::span<const core::IReferenceCounted>)`
    -> New transfer property signature
        -> make pipeline barriers more robust (or require everything to be done properly outside the function)
        -> First parameter: SIntendedSubmitInfo (IUtilities-independent submit info struct thing for handling overflows)
            -> Source for it: https://github.com/Devsh-Graphics-Programming/Nabla/blob/vulkan_1_3/include/nbl/video/utilities/SIntendedSubmitInfo.h
            -> Move IUtilities::autoSubmit and IUtilities::autoSubmitAndBlock to SIntendedSubmitInfo as static method (no more relation to IUtilities)
        -> Second parameter: struct with parameters
            `const asset::SBufferBinding<video::IGPUBuffer>& scratch, system::logger_opt_ptr logger, const size_t baseDWORD=0ull, const size_t endDWORD=~0u`
            -> Additional parameters that are optional including additional pipeline barrier values
            bitfield/boolean [pre|post]ScratchBarrier = true
    -> lets keep MaxPropertiesPerDispatch and have it equal to 64kb/sizeof(nbl::hlsl::property_pools::transferTrequest)
        -> instead of copy lambda logic at https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/utilities/CPropertyPoolHandler.cpp#L172, fail if over MaxPropertiesPerDispatch
    -> leave upstreaming thing & contiguous buffers for later (#ifdef 0 it out)
        -> transferProperties with upstreaming & freeProperties
    -> IPropertyPool
        -> allocateProperties: use span instead of begin & end
            -> (behaviour) 
                -> goes through indices to find empty ones and allocate them
                -> if it's contiguous: add mapping from index to addr and addr to index
        -> nuke descriptor set stuff (line 198 -> 211)
        -> validateBlocks: change offset check (https://github.com/Devsh-Graphics-Programming/Nabla/blob/64cbb652e39acf0239a61bcee7fc26d70ab8d089/src/nbl/video/utilities/IPropertyPool.cpp#L38) to BDA
            -> check usages & non null address
    -> CPropertyPool: don't change anything, just make sure identation is right

    -> MegaDescriptorSet (Descriptor set sub-allocate)
        -> Have a multi-timeline event functor with IFuture await
        ```cpp
            MultiTimelineEventHandlerST<DeferredFreeFunctor> deferredFrees;
            deferredFrees.latch(futureWait,std::move(functor));
        ```
            -> Also have it on IPropertyPool
            -> Solve synchronization issues

    -> create example testing downloads, uploads of properties
        -> with IB, without IB, fills, etc etc
        -> use regular buffer for everything
        -> later test the streaming buffers (ifdef them back in)

Testing

TODO list:

  • [ ] Verify why things aren't being written accurately
  • [ ] Implement address buffer handling
  • [ ] Baseline test
  • [ ] Test with IOTA
  • [ ] Test with fill buffers
  • [ ] Test with different element sizes
  • [ ] Test with different element counts
  • [ ] Test with different transfer amounts

deprilula28 avatar Jan 25 '24 02:01 deprilula28