wasmtime
wasmtime copied to clipboard
Cranelift: simplify opcode set by removing bitwise operations on floats
👋 Hey,
Jun Ryung Ju (sorry, I don't know the github user) on zulip discovered that bitwise operations on floats are unimplemented on AArch64. Upon further investigation they seem to be unimplemented on all backends.
@bjorn3 also mentions that cg_clif does not use them, it does a bitcast and then the operation on the integer version.
This makes it likely a good candidate for simplifying our opcode set.
Is there any benefit / optimization that we can do on these that would justify keeping them around?
The bitwise ops that I'm referring to are:
bandborbxorbnotband_notbor_notbxor_not
cc: @cfallin
+1 remove.
Perhaps optimizations like #4803 would do better if we keep bitwise CLIF operations on floats—we could express the optimization in the mid-end without introducing extra bitcast instructions.
That said, looking more closely, I think the best code sequence for that particular optimization is different enough across backends that we may not want to do it in the mid-end anyway. And it's niche enough to be a really weak argument for keeping bitwise operators on floats.
By contrast, I find the opposite argument compelling: they're complicated to lower on probably any architecture with a hardware floating-point unit, and nobody has wanted them yet.
So I'm also in favor of removing them.
@afonso360 You are missing band_not in your list.
Those operations are actually straightforward to implement on AArch64, since there is a hardware floating-point unit iff SIMD is supported; only bxor_not is a 2-instruction sequence (instead of 1). However, I probably also lean towards removing them, given the niche use case.
Added, Thanks!
Agreed, I think we should remove them. A floating-point type imbues some meaning on the value such that it's not meant to be used (or is not typically used) as a "bucket of bits" type, as integer types are.
I suspect that it would be possible to do reasonable lowerings for these on x86-64 too (since we use XMM regs and would have the integer-vec instructions at our disposal, just as on aarch64), so that angle doesn't hold as much significance for me. However, the "only build what is needed" angle does: if no one actually needs these op/type combinations and they are extra lowerings to add, maintain, and test, then let's not have them IMHO.
It turns out that Wasmtime can actually generate SIMD bitwise ops on floating-point vectors, as in this example (from tests/misc_testsuite/simd/issue_3327_bnot_lowering.wast and #3327):
(module
(func $v128_not (export "v128_not") (result v128)
v128.const f32x4 0 0 0 0
f32x4.abs
v128.not)
)
(assert_return (invoke "v128_not") (v128.const i32x4 -1 -1 -1 -1))
That currently compiles to this CLIF:
function u0:0(i64 vmctx) -> i8x16 fast {
const0 = 0x00000000000000000000000000000000
block0(v0: i64):
@0026 v2 = vconst.i8x16 const0
@0038 v3 = raw_bitcast.f32x4 v2 ; v2 = const0
@0038 v4 = fabs v3
@003b v5 = bnot v4
@003d v6 = raw_bitcast.i8x16 v5
@003d jump block1(v6)
block1(v1: i8x16):
@003d return v1
}
Do we want cranelift-wasm to insert a bitcast in between the fabs and bnot here?
I would prefer fewer raw_bitcasts if at all possible? I'm not very fond of it...
Ah, interesting! That actually does change the calculus a bit in my mind. Given the principle above (only build what is needed) as the higher-priority one, I think it's totally fine to keep this support. Then for consistency it probably makes sense to support scalar floats as well; otherwise we have support for scalar ints, and all kinds of vectors, but not scalar floats, which is an odd and surprising hole.
I'm closing this issue as we seem to have reached consensus on keeping bitwise operations on floats.
Ok. I am working on implementing float bitops on x86_64 currently. looks implementing bitwise operation using bitcasts results terrible.
// function %band_f32(f32, f32) -> f32 {
// block0(v0: f32, v1: f32):
// v2 = band v0, v1
// return v2
// }
VCode {
Entry block: 0
v130 := v135
Block 0:
(original IR block: block0)
(instruction range: 0 .. 8)
Inst 0: args %v128=%xmm0 %v129=%xmm1
Inst 1: movd %v128, %v132l
Inst 2: movd %v129, %v133l
Inst 3: andl %v132l, %v133l, %v134l
Inst 4: movd %v134l, %v135
Inst 5: movaps %v130, %v131
Inst 6: movaps %v131, %xmm0
Inst 7: ret
}
~~I think we need to make F32 and F64 to select Gpr instead of Xmm registers.~~ ~~Is this possible or should I make an optimization for scalar float memory ops?~~
Ok nvm. I think I just need lower scalar bitops to XMM bitops