intel-intrinsics
intel-intrinsics copied to clipboard
The Dlang SIMD library
Currently LFENCE is modelled by a full memory barrier on LLVM with `llvm_memory_fence` which generates MFENCE. It was assumed MFENCE is more restrictive than LFENCE, which is a performance issue...
~~165 TODO left~~ ~~132 TODO left~~ ~~123 TODO left~~ ~~105 TODO left~~ 91 TODO left
Enabling core.simd: - [x] We can enable core.simd usage with DMD today, without even using D_SIMD, which brings the performance gap of LDC vs DMD from 20x to 4x. DMD...
Concrete context always has a type and size information, so there isn't such a need for those in the public API. You can always use `_mm_loadu_ps` and friends.
It is cumbersome / problematic to have to pass `--enable-cross-module-inlining` to LDC to get good performance; this may not be an option in a larger project. Perhaps adding a UDA...
Add one here every time you wish for one: - [ ] `_mm_cvtpd_epi64` that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 =>...
It can make time-consuming bugs downstream, as people expect 0.0f ^ something to yield 0, but instead it yield a very large value.
Function that takes a pointer which is then accessed too greedily should be `@system`. This is breaking unfortunately. Find and fix all such functions that either cast a pointer to...
I just noticed that WASM also has SIMD support. (I'm not personally interested at the moment, but hope this helps someone in the future ;-) Perhaps performance is already quite...