Sebastian Aaltonen

Results 11 comments of Sebastian Aaltonen

Thanks for pointing this out. Originally the shader was properly aligned, but then I added the binary OR (with runtime value) to ensure that compilers don't optimize the whole loop...

Tested on RTX 2080 Ti (Turing) using 419.35 drivers. Neither multiply or divide + multiply works. Still getting same performance. Now downloading new drivers to see whether Nvidia has finally...

Tested with 431.60 drivers too. Alignment doesn't seem to help Turing. This is the new address calc code: ``` uint elemIdx = (htid + i) | loadConstants.elementsMask; uint address =...

Could you run the test by replacing loadRawBody.hlsli with this: ``` #include "hash.hlsli" #include "loadConstantsGPU.h" RWBuffer output : register(u0); ByteAddressBuffer sourceData : register(t0); cbuffer CB0 : register(b0) { LoadConstants loadConstants;...

Same results with Ampere. No improvement there either.

Sounds like a driver bug. Adding flush between the dispatch and End would likely put the dispatch and profiling end timestamp recording to different command lists, which could add penalty...

Thanks for the detailed info. CUDA has different memory model as DirectX/OpenGL/Vulkan (raw data vs data fetched using resource descriptors). It makes sense that Nvidia has different hardware paths for...

Thanks for the response! Good to hear that improvements are being made that could make this feature possible in the future. I don't personally have time right now to contribute,...

In DirectX, read-only buffer (Buffer) is supported for all types. Texel buffers all use opaque descriptor in DirectX. The shader doesn't know the buffer type. Sampler hardware does the type...

Typed UAV Load (extended formats) support can be found in the following table: https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D Nvidia Fermi (GTX 500) and Kepler (GTX 600 and 700) don't suppor it and Intel Gen...