slang Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics

For context, see the related issue here

In short, bitfield insertion and extraction are useful tools for reinterpreting structures where types have differing sizes, as they allow for simpler and more hardware-efficient instructions for byte manipulation.

Before this change, bitfieldExtract and bitfieldInsert were only supported outside of the compiler, when using -allow-glsl.

Here, I'm introducing KIROp_BitfieldExtract and KIOROp_BitfieldInsert instructions, which by default generate the original fallback bitfield manipulation logic but in CLikeSourceEmitter.

Then, when emitting SPIR-V or GLSL, these compilation paths override this default logic to instead tap into more specialized intrinsics.

Sep 06 '24 00:09 natevm

This needs to come with a test.

Agreed. Looks like the current test suite is passing which is good. I’ll see if I can write something up which tests on a couple different target backends.

Sep 06 '24 07:09 natevm

@csyonghe What's the correct approach to constructing vectors for the CUDA backend?

bitfieldExtract is a component-wise intrinsic, so I need to support a function like this as a right hand side which can be assigned to the left:

uint4 bfe(uint4 val, int off, int bits) {
    return ((val >> off) & ((1u << bits) - 1));
}

However, every time I try to construct an int4 from off, I get errors similar to the following:

nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "unsigned int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"

I've tried constructing an int4 as int4(off, off, off, off) but this does not work either.

I could try emitting a make_int4 as a special case for CUDA, but this doesn't generalize easily to uint8_t, uint16_t, or uint64_t. (I suppose I could make a lookup table… but that seems a bit heavy handed…)

Sep 08 '24 23:09 natevm

For the time being, looks like a special case lookup table works for CUDA.

Sep 09 '24 00:09 natevm

On adding these tests, I noticed that the current logic in glsl.meta.slang for bitfield extraction in the signed integer case was incorrect. For signed integers, the extracted bits are expected to be sign-extended, but the fallback for this wasn’t doing sign extension.

The current logic now correctly accounts for this, and I have some explicit tests to back that up now too.

Sep 09 '24 03:09 natevm

Looks like there are some newly failing tests on my end after adding these flags for the other targets. I'll see if I can figure out what's going on.

For i32 extract

[x] using -cpu seems to be throwing error : invalid operands to binary expression ('Vector<int32_t, 4>' (aka 'Vector<int, 4>') and 'int32_t' (aka 'int')) -- Fixed, was missing a vector type for cpu targets
all other targets pass

For i64 extract

[x] Using -cpu again throws similar invalid operand error. -- Fixed, was missing a vector type for cpu targets
Using -vk throws unexpected type given to bitfieldExtract in SPIR-V emit. I might have forgotten to ensure 64-bit values work in the SPIR-V specific logic...

The same failure patterns occur for i32/64 insertion. I'm not sure yet about for metal. Hopefully the CI will give us some answers there.

Sep 09 '24 20:09 natevm

I'm a bit confused with the Vulkan spec... I think I might have found a typo?...

According to the spec:

The Base operand of any OpBitCount, OpBitReverse, OpBitFieldInsert, OpBitFieldSExtract, or OpBitFieldUExtract instruction must be a 32-bit integer scalar or a vector of 32-bit integers.

This appears even if I require SpvCapabilityInt64. At least with NVIDIA, I'm finding that the tests here reveal that 64-bit integers are supported, and do give correct results. But evidently there is ambiguity on what exactly the shaderInt64 capability means.

Sep 10 '24 19:09 natevm

It appears the two outstanding issues are:

When SLANG_RUN_SPIRV_VALIDATION=1, SPIR-V validation does not correctly consider how SpvCapabilityInt64 extends the capabilities of OpBitfieldSExtract/OpBitfieldUExtract/OpBitfieldInsert intrinsics. This is causing the one windows CI build to fail. Setting SLANG_RUN_SPIRV_VALIDATION=0 and ignoring the validation errors, I get correct results.

This appears to be a regression caused by this PR from Samsung: https://github.com/KhronosGroup/SPIRV-Tools/pull/4758. Apparently their hardware does not support 64-bit values for certain intrinsics.

For Metal, it appears that when targetting release mode, Slang appears to be translating my vec<int64_t, 4> Metal type into int64_t4, which is not an existing type in the Metal shading language.

So for 1, unfortunately because of some restrictions with Samsung's HW, there is no way to really tap into the fast path HW on NVIDIA architectures for non-32-bit types with bitfield extraction / insertion. So I'll have to think about some workaround for non-32-bit types...

For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.

@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?

Sep 10 '24 21:09 natevm

You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.

Sep 11 '24 21:09 csyonghe

Only release build runs the full suite of tests, debug build doesn't run most tests.

Sep 11 '24 21:09 csyonghe

You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.

I did try to use this flag like so: //TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-vk -skip-spirv-validation

But this throws an error that -skip-spirv-validation is not a valid command line argument. Perhaps it's because I'm using this TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK) pattern?

The other test using this flag follows a different pattern: //TEST:SIMPLE(filecheck=SPIRV):-target spirv -entry computeMain -stage compute -emit-spirv-directly -skip-spirv-validation

Sep 12 '24 02:09 natevm

For compute tests, you need -slang -skip-spirv-validation.

Sep 12 '24 02:09 csyonghe

For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.

@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?

@csyonghe looks like this is the last major issue at the moment. I don't have a Mac, so I can't reproduce this locally to debug it. Looks like 64-bit integer vectors broken in general in the Metal backend. Perhaps we should file an issue to resolve that.

Sep 13 '24 20:09 natevm

You should be able to fix this by changing the way we emit vector types when the element type is 64bit int in slang-emit-metal.cpp, look for kIROp_VectorType in that file to find the place to emit the type.

You can test this locally on a windows machine by adding a //TEST:SIMPLE:-target metal On the offending test file to trigger a compile only test with the source, that will run the metal compiler on windows to give you any metal compile errors without actually running the shader.

Sep 13 '24 20:09 csyonghe

@csyonghe looks like all relevant tests are passing.

There’s one windows CI machine which is reporting that bash is missing, but I see the same error on ToT, so I’m assuming that’s okay to see.

Sep 15 '24 21:09 natevm

Apologies on the delay here. Been a bit tied up with a move between states. I should have a bit more time to work on this in the upcoming week.

Sep 28 '24 23:09 natevm

All committers have signed the CLA.

Nov 25 '24 19:11 CLAassistant

@kaizhangNV Going to try digging into this today. Sounds like there's not much left needed to get this done.

Dec 08 '24 20:12 natevm

It looks like the main concern for metal is addressed already. The only thing remaining is https://github.com/shader-slang/slang/pull/5020#discussion_r1761623703

Dec 09 '24 16:12 kaizhangNV

@csyonghe This PR is in a good shape. In the latest change, I see Nate already addressed the special handling for metal backend. Do you have any other concerns except the naming suggestion? I feel it's not worth for me to take over this as I probably will have to create a new PR.

Dec 09 '24 16:12 kaizhangNV

@kaizhangNV The remaining work here is to make sure it works on WGPU. @natevm do you have time for this or do you want @kaizhangNV to take over?

Dec 10 '24 18:12 csyonghe

@kaizhangNV has already landed this change in another PR, so closing this one.

Dec 12 '24 23:12 csyonghe