wasmtime
wasmtime copied to clipboard
cranelift: Support `bnot`, `band`, `bor`, `bxor` for x86_64.
This patch implements float bitops on x86_64 using SSE instructions. @afonso360
- [x] Check if better single slot bitops available on x86_64. (which has better latency or throughput?...)
- [ ] Make single slot mask for
f32
,f64
instead ofvector_all_ones
WIP right now. Some masks on the bitops are invalid. trying to fix it.
I think there is no better options for single float bitops instruction on x86_64. (at least on SSE)
first of all, I need to make test for float to vec to check that unused bits are filtered out. bnot operation may generate incorrect code because current bnot implementation does use vector_all_ones
which will flip all unused bits. means it has side-effect.
so I am finding a efficient way to generate single slot bit mask. (not using load ops. that will need extra stack allocations.)
https://github.com/bytecodealliance/wasmtime/pull/5036/files#diff-4fb30cf23cc13cba2f11079dc6d0305f972170585aab27b27d143c2f3252e4faR1270-R1274
You can tag me once you're ready for review; I glanced over this and it made sense.
I think there is no cheap way to generate masks...
~~maybe we need to truncate all unused vector slots when using scalar_to_vector
~~
~~I'll make seperate PR for this task.~~
nevermind. scalar_to_vector is using movss, movsd.
@abrown I think its ready to review.
sorry for the very late response for the review.
@ArtBlnd, I believe the CI failure here is due to some rustc changes between when this PR was open and now. Can you rebase this PR and see if that resolves things?
@ArtBlnd, I believe the CI failure here is due to some rustc changes between when this PR was open and now. Can you rebase this PR and see if that resolves things?
Done.