Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics
For context, see the related issue here
In short, bitfield insertion and extraction are useful tools for reinterpreting structures where types have differing sizes, as they allow for simpler and more hardware-efficient instructions for byte manipulation.
Before this change, bitfieldExtract and bitfieldInsert were only supported outside of the compiler, when using -allow-glsl.
Here, I'm introducing KIROp_BitfieldExtract and KIOROp_BitfieldInsert instructions, which by default generate the original fallback bitfield manipulation logic but in CLikeSourceEmitter.
Then, when emitting SPIR-V or GLSL, these compilation paths override this default logic to instead tap into more specialized intrinsics.
This needs to come with a test.
Agreed. Looks like the current test suite is passing which is good. I’ll see if I can write something up which tests on a couple different target backends.
@csyonghe What's the correct approach to constructing vectors for the CUDA backend?
bitfieldExtract is a component-wise intrinsic, so I need to support a function like this as a right hand side which can be assigned to the left:
uint4 bfe(uint4 val, int off, int bits) {
return ((val >> off) & ((1u << bits) - 1));
}
However, every time I try to construct an int4 from off, I get errors similar to the following:
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "unsigned int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
I've tried constructing an int4 as int4(off, off, off, off) but this does not work either.
I could try emitting a make_int4 as a special case for CUDA, but this doesn't generalize easily to uint8_t, uint16_t, or uint64_t. (I suppose I could make a lookup table… but that seems a bit heavy handed…)
For the time being, looks like a special case lookup table works for CUDA.
On adding these tests, I noticed that the current logic in glsl.meta.slang for bitfield extraction in the signed integer case was incorrect. For signed integers, the extracted bits are expected to be sign-extended, but the fallback for this wasn’t doing sign extension.
The current logic now correctly accounts for this, and I have some explicit tests to back that up now too.
Looks like there are some newly failing tests on my end after adding these flags for the other targets. I'll see if I can figure out what's going on.
For i32 extract
- [x] using
-cpuseems to be throwingerror : invalid operands to binary expression ('Vector<int32_t, 4>' (aka 'Vector<int, 4>') and 'int32_t' (aka 'int'))-- Fixed, was missing a vector type for cpu targets - all other targets pass
For i64 extract
- [x] Using
-cpuagain throws similar invalid operand error. -- Fixed, was missing a vector type for cpu targets - Using
-vkthrowsunexpected type given to bitfieldExtract in SPIR-V emit. I might have forgotten to ensure 64-bit values work in the SPIR-V specific logic...
The same failure patterns occur for i32/64 insertion. I'm not sure yet about for metal. Hopefully the CI will give us some answers there.
I'm a bit confused with the Vulkan spec... I think I might have found a typo?...
According to the spec:
The Base operand of any OpBitCount, OpBitReverse, OpBitFieldInsert, OpBitFieldSExtract, or OpBitFieldUExtract instruction must be a 32-bit integer scalar or a vector of 32-bit integers.
This appears even if I require SpvCapabilityInt64. At least with NVIDIA, I'm finding that the tests here reveal that 64-bit integers are supported, and do give correct results. But evidently there is ambiguity on what exactly the shaderInt64 capability means.
It appears the two outstanding issues are:
- When
SLANG_RUN_SPIRV_VALIDATION=1, SPIR-V validation does not correctly consider howSpvCapabilityInt64extends the capabilities ofOpBitfieldSExtract/OpBitfieldUExtract/OpBitfieldInsertintrinsics. This is causing the one windows CI build to fail. SettingSLANG_RUN_SPIRV_VALIDATION=0and ignoring the validation errors, I get correct results.
This appears to be a regression caused by this PR from Samsung: https://github.com/KhronosGroup/SPIRV-Tools/pull/4758. Apparently their hardware does not support 64-bit values for certain intrinsics.
- For Metal, it appears that when targetting release mode, Slang appears to be translating my
vec<int64_t, 4>Metal type intoint64_t4, which is not an existing type in the Metal shading language.
So for 1, unfortunately because of some restrictions with Samsung's HW, there is no way to really tap into the fast path HW on NVIDIA architectures for non-32-bit types with bitfield extraction / insertion. So I'll have to think about some workaround for non-32-bit types...
For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.
@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?
You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.
Only release build runs the full suite of tests, debug build doesn't run most tests.
You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.
I did try to use this flag like so:
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-vk -skip-spirv-validation
But this throws an error that -skip-spirv-validation is not a valid command line argument. Perhaps it's because I'm using this TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK) pattern?
The other test using this flag follows a different pattern:
//TEST:SIMPLE(filecheck=SPIRV):-target spirv -entry computeMain -stage compute -emit-spirv-directly -skip-spirv-validation
For compute tests, you need -slang -skip-spirv-validation.
For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.
@csyonghe For 2, do you know what's causing the release build on MacOS to fail with
vec4(int64_t, N), but the debug build on MacOS to pass?
@csyonghe looks like this is the last major issue at the moment. I don't have a Mac, so I can't reproduce this locally to debug it. Looks like 64-bit integer vectors broken in general in the Metal backend. Perhaps we should file an issue to resolve that.
You should be able to fix this by changing the way we emit vector types when the element type is 64bit int in slang-emit-metal.cpp, look for kIROp_VectorType in that file to find the place to emit the type.
You can test this locally on a windows machine by adding a //TEST:SIMPLE:-target metal On the offending test file to trigger a compile only test with the source, that will run the metal compiler on windows to give you any metal compile errors without actually running the shader.
@csyonghe looks like all relevant tests are passing.
There’s one windows CI machine which is reporting that bash is missing, but I see the same error on ToT, so I’m assuming that’s okay to see.
Apologies on the delay here. Been a bit tied up with a move between states. I should have a bit more time to work on this in the upcoming week.
@kaizhangNV Going to try digging into this today. Sounds like there's not much left needed to get this done.
It looks like the main concern for metal is addressed already. The only thing remaining is https://github.com/shader-slang/slang/pull/5020#discussion_r1761623703
@csyonghe This PR is in a good shape. In the latest change, I see Nate already addressed the special handling for metal backend. Do you have any other concerns except the naming suggestion? I feel it's not worth for me to take over this as I probably will have to create a new PR.
@kaizhangNV The remaining work here is to make sure it works on WGPU. @natevm do you have time for this or do you want @kaizhangNV to take over?
@kaizhangNV has already landed this change in another PR, so closing this one.