packed_simd
packed_simd copied to clipboard
Portable Packed SIMD Vectors for Rust standard library
LLVM's `cttz.v8i8` intrinsic is broken on AArch64 machines: https://github.com/rust-lang-nursery/packed_simd/issues/191 Our current workaround just applies `u8::trailing_zeros` to each lane. With 8 lanes, that can be quite slow. It could be optimized...
@TheIronBorn suggested that the random number generator that we are using for packed vectors in the aobench benchmark is too naive. It would be great if we could use a...
As mentioned by @gnzlbg in a comment on #168, some benchmarks in the `benches` directory don't get build during CI. This means they can stay broken for a while without...
This is a tracking issue for expanding upon the machine code analysis part of the perf guide. ## To do - [ ] how to emit assembly from Rust code...
Currently, `.cast()` performs a "numeric cast" for each vector lane that's semantically equivalent to `as` for primitive scalars. The [RFC 2484](https://github.com/rust-lang/rfcs/pull/2484) proposes a couple of cast methods to replace `as`....
CI takes a really long time, and in the case of AppVeyor which does not run jobs in parallel it can take several hours. ## Ideas - [ ] Use...
There are many raw loops that should probably use iterators.
I've looked again at the [fastest mandelbrot implementation on benchmarksgame](https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/mandelbrot-gpp-1.html) and made a list of things we need to get before we can even begin to think about challenging the...
They were disabled by commit 16290059ae58f76667a1ae503f152d9b2520c337, as part of PR #152. AppVeyor logs don't report any error message, besides "error: test failed".
Tried running `./ci/run.sh`, everything ran smoothly until: ``` failures: ---- api::ops::vector_rotates::u64x8::assert_rotate_left_vpro stdout ---- disassembly for verify::api::ops::vector_rotates::u64x8::assert_rotate_left_vpro::rotate_left_shim: 0: lea -0x3f307(%rip),%rax # 1000 1: mov 0xbacda(%rip),%rcx # fafe8 2: mov %rax,(%rcx) 3:...