SWIFT
SWIFT copied to clipboard
AVX512F instructions
Hi,
First of all - thanks for creating (and open-sourcing) this swift code! Looks great!
I was looking through the SIMD wrappers for AVX512F
in vector.h
and I noticed a few wrappers that refer to non-existent intrinsics (at least in AVX512F
) or have better implementations. In particular, vec_and
maps to _mm512_and_ps
, which does not exist (at least according to the Intel Intrinsics Guide). From the looks of it, all and/or
operations are now only relevant for masks
and not for individual data-types.
I also saw that vec_fabs
is implemented via two intrinsics -- is the new _mm512_abs_ps
intrinsic too slow?
I am also curious - I do not see any references to any mask(z)_load
. I found those masks quite useful for staying in SIMD mode and eliminating the serial part of the code (dealing with remainder loops for array lengths not divisible by the SIMD width).
Once again, the performance gains look awesome!