xsimd
xsimd copied to clipboard
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
There is a trick borrowed from Lemire that I am using to implement left shift when unavailable: multiply by 2^n. (The rest of the trick is that for right shifting,...
I've noticed that currently the `load_masked` and `store_masked` only supports `batch_bool_constant`. I think `load_masked` and `store_masked` is very suitable for dealing with loop tails, however in this case the mask...
@serge-sans-paille @JohanMabille this ideas works, but I cannot figure out how to refactor `bitwise_lshift_as_twice_larger` into a separate header. The issue: - `xsimd_sse2.hpp` needs `utils/shits.hpp` for `bitwise_lshift_as_twice_larger` - but `bitwise_lshift_as_twice_larger` needs...
This is the proposed Pixi workflow Tasks: - [x] Add pixi.toml - [x] Add CMakePresets - [x] Add usage to documentation - [x] Add sample usage in CI Some notes...
While adding too many utilities for `batch_constant` may not be a goal, I believe an additional utility to generate a `batch_constant` with increasing numbers from zeros could be an interesting...
Currently, I locally use [Pixi](https://pixi.sh) which is a modern project-oriented Conda environment manager, which I would like to contribute here. It can easily manage multiple environment and tasks (along with...
On main, running the tests give this failure. ``` [doctest] doctest version is "2.4.12" [doctest] run with "--help" for options =============================================================================== /Users/antoine/workspace/github.com/xtensor-stack/xsimd/test/test_error_gamma.cpp:156: TEST CASE: [error gamma] gamma /Users/antoine/workspace/github.com/xtensor-stack/xsimd/test/test_error_gamma.cpp:150: ERROR: CHECK_EQ(...
Some intrinsics for a size `N` are introduce in the same generation that introduces a register of size `2N. - `_mm_srlv_epi32` (128 bits) is introduced in Avx2, along with `_mm256`...
1. Adding stream API for non temporal data transfers 2. Adding xsimd::fence as a wrapper around std atomic for cache coherence 3. Adding tests ~~Draft because I need to double...