OpenBLAS introduce groundwork for FP16 (half precision IEEE 754 FLOAT) support in both AVX512 and f16c versions

This PR provides for the basic infrastructure needed for detecting the presence of hardware for supporting any and all available FP16 capabilities available on x86_64 hardware.

FP16 differs from BF16 in how bits are allocated to range and precision. Where BF16 is basically a truncated FP32, FP16 attempts to keep roughly the same ratio of precision to range as other IEEE 754 Floating point types.

x86_64 has 2 different ways of supporting FP16.

The first is a legacy support model using f16c, or Float 16 convert. This small ISA extension introduced with the Ivybridge generation allows for conversion of FP32 types to fp16 types in general purpose registers and/or SIMD vector registers (works on XMM, YMM and ZMM registers if present.)

It was introduced on AMD in the bulldozer generation after being announced under the name CVT16 (Convert 16) before being renamed to f16c a few years later.

The second is AVX512 FP16.

Introduced unofficially with Alderlake Pcores in their Golden cove architecture, avx512FP16 differs from f16c in that it does no simply use the lower half of FP32 registers. This means that twice as many values can be stored in an equivalent register (32 in ZMM using AVX512FP16 as opposed to 16 values using AVX512f+f16c) increasing effective memory through put as well as removing the need for conversion overhead.

AVX512FP16 will officially debut in market with the launch of Sapphire Rapids.

certain types of applications have no need for the precision of FP32, but need slightly more precision than what is offered by BF16. FP16 reaches a good compromise.

Being as GPU's, ARM and other accelerators, ISA's and uArchs have had IEEE 754 FP16 for a few years, it's high time support be added.

-FelixCLC

Sep 05 '22 14:09 FCLC

should probably mention that this in part addresses #3512 #3490

Sep 05 '22 14:09 FCLC

additional changes for cleanup, added self to contributors etc.

Sep 05 '22 15:09 FCLC

thanks - but are these SapphireRapids kernels for bfloat16 or fp16 ? I suspect naming confusion, in #2767 it was agreed to use "b" prefixes for bfloat16 and "h" for an eventual fp16 (which does not appear to have been missed in all these years, and for which the entire groundwork of function table entries, headers, generic implementation would need to be added like it was for bfloat) Also I do not think we need a subdirectory for SR kernels in x86_64 (?)

Sep 05 '22 17:09 martin-frbg

thanks - but are these SapphireRapids kernels for bfloat16 or fp16 ? I suspect naming confusion, in #2767 it was agreed to use "b" prefixes for bfloat16 and "h" for an eventual fp16 (which does not appear to have been missed in all these years, and for which the entire groundwork of function table entries, headers, generic implementation would need to be added like it was for bfloat) Also I do not think we need a subdirectory for SR kernels in x86_64 (?)

Apologies, this can be ignored and removed entirely, it's from when I was doing some testing and messing about with other code.

I've got a few more changes about to come in, I'll remove the files while I'm at it.

Sep 05 '22 17:09 FCLC

OpenBLAS OpenBLAS copied to clipboard

introduce groundwork for FP16 (half precision IEEE 754 FLOAT) support in both AVX512 and f16c versions

OpenBLAS
OpenBLAS copied to clipboard