llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Move bfloat support from experimental to supported.

Open rdeodhar opened this issue 2 years ago • 29 comments

This change makes bfloat16 a supported feature.

Signed-off-by: Rajiv Deodhar [email protected]

rdeodhar avatar Aug 03 '22 22:08 rdeodhar

Looks good, do we want to also move these bfloat16 math functions out of experimental also: https://github.com/intel/llvm/blob/bdd88e50f0165e339f85c9b134d49972de079370/sycl/include/sycl/ext/oneapi/experimental/builtins.hpp#L160 since they are defined in the same extension document as the main bfloat16 class?

btw there should be an accompanying PR to intel/llvm-test-suite updating corresponding tests, otherwise there will be lots of failures: For example in this test: https://github.com/intel/llvm-test-suite/blob/intel/SYCL/BFloat16/bfloat16_type.hpp.

JackAKirk avatar Aug 04 '22 14:08 JackAKirk

Also FYI there is another open PR updating the bfloat16 class that it might be good to consider merging first https://github.com/intel/llvm/pull/6492.

JackAKirk avatar Aug 04 '22 14:08 JackAKirk

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Aug 04 '22 20:08 rdeodhar

I also have two global comments:

  • Please update the specification to use the template in (https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/template.asciidoc) and follow the instructions in (https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/README-process.md). I would like all new "supported" extensions to use the new template format.

  • There was a request from the ML team that the bfloat16 type should not be an "optional feature". Instead, they proposed that it should be allowed for any device. Devices without native support should use a fall-back routine for conversion. Couldn't we do this by following the pattern of the other math operations we have in "llvm/libdevice"? If we do this, there is no need for the aspect, because application code wouldn't need to check if a device supports bloat16 before using it.

gmlueck avatar Aug 23 '22 16:08 gmlueck

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Aug 24 '22 22:08 rdeodhar

/run with intel/llvm-test-suite#1129

rdeodhar avatar Aug 25 '22 21:08 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Aug 26 '22 01:08 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Aug 26 '22 23:08 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Aug 31 '22 00:08 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 02 '22 22:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 08 '22 00:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 08 '22 01:09 rdeodhar

/verify with https://github.com/intel/llvm-test-suite/pull/1129

rdeodhar avatar Sep 08 '22 05:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 09 '22 00:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 09 '22 21:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 12 '22 21:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 14 '22 23:09 rdeodhar

Recently added sycl/test/matrix/matrix-bfloat16-test-use.cpp needs to be updated to remove 'experimental'.

asudarsa avatar Sep 15 '22 02:09 asudarsa

some of the tests use from_bits though it has been made private. Also, sycl/test/extensions/bfloat16_host.cpp header file location for bfloat16.h needs to be corrected.

asudarsa avatar Sep 15 '22 02:09 asudarsa

Example draft patch that incorporates new non-native software impls that pass tests for CUDA backend is here: https://github.com/rdeodhar/llvm/pull/1.

JackAKirk avatar Sep 15 '22 16:09 JackAKirk

By the way, just in case you didn't notice, the bfloat16_type.cpp, is currently still marked xRUN on the CI for Intel (and also in my updated patch where I added CUDA support: bfloat16_type.cpp). I imagine that CI will be able to test this now for any Intel arch now that there are software implementations so it should be marked RUN instead of xRUN at the appropriate point? Also should there also be an additional compilation instruction for Intel Arch's that support native bfloat16 instructions, like there is for CUDA for sm_80 (in my updated patch)?

JackAKirk avatar Sep 15 '22 16:09 JackAKirk

/verify with llvm-test-suite#1129

rdeodhar avatar Sep 16 '22 02:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 16 '22 02:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 16 '22 16:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 16 '22 22:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 19 '22 17:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 19 '22 20:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 20 '22 23:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 22 '22 00:09 rdeodhar

/verify with intel/llvm-test-suite#1129

rdeodhar avatar Sep 22 '22 21:09 rdeodhar