Chris Fallin comments

Results 323 comments of


                                            Chris Fallin

trafficstars

cranelift/x64: Narrow `test` immediate operands

One reason to try to lift optimizations like this into ISLE, and keep the VCode insts as close to 1-to-1 correspondence with machine code as possible, is that it makes...

Windows AARCH64 Target

This depends on #4992 -- a little bit of core runtime functionality (trap handling, unwind info generation) necessary for this OS/architecture pair. If you're willing to work on this, we'd...

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

@afonso360, if you still have access to your RISC-V hardware and a bit of time to test, it would be useful to confirm that this changes the result of the...

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

Probably the latter -- the loads are not removed, and make it all the way to machine code.

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

(More specifically, a load inserted by this pass *could* later be removed by alias analysis, but only if subsumed by another earlier load to the same address, so there is...

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

One other performance-related note: this will possibly have worse impact when the pooling allocator with copy-on-write is enabled in Wasmtime, as the "first touch" to a page matters: if demanded...

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

The problem I see with that approach is that we only know that dynamically: consider if we have a 64-bit store (allowed to be unaligned) to address `X`, we would...

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

I think that would work, yeah. There are some other interesting performance tradeoffs here -- two that come to mind are "partial-store forwarding" (partially overlapping loads/stores can be problematic for...

Cranelift: implement "precise store traps" in presence of store-tearing hardware.

Ah yes, indeed. This is only a problem on aarch64 and riscv64, and AFAICT all aarch64 hardware one would *want* to run a high-performance server on (Apple implementations, AWS machines,...

cranelift: Delete redundant DCE optimization pass

IIRC, the "legacy" passes we call before egraph opt are the ones that are important for compile time, or at least were when egraph opt was introduced. Basically as you...