dpctl
dpctl copied to clipboard
sycl::vec overloads for elementwise functions
This PR adds overloads for sycl::vec
input to the implementations of dpctl.tensor.abs
, dpctl.tensor.cos
, dpctl.tensor.expm1
, dpctl.tensor.log
, dpctl.tensor.log1p
, and dpctl.tensor.sqrt
.
- [X] Have you provided a meaningful PR description?
- [X] Have you added a test, reproducer or referred to an issue with a reproducer?
- [X] Have you tested your changes locally for CPU and GPU devices?
- [X] Have you made sure that new changes do not introduce compiler warnings?
- [ ] Have you checked performance impact of proposed changes?
- [X] If this PR is a work in progress, are you opening the PR as a draft?
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1223/index.html
coverage: 85.942%. remained the same when pulling 67bde5971a172535b0c0ef122a28031addd53a54 on elementwise-func-fixes into 5ec9fd5becdc45849c269dd553f255e5841cba49 on master.
Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_9 ran successfully. Passed: 259 Failed: 741 Skipped: 116
@ndgrigorian Please check that enabling vec
brings on performance benefits on Max GPU.
Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_20 ran successfully. Passed: 320 Failed: 680 Skipped: 119
Array API standard conformance tests for dpctl=0.14.4=py310h7bf5fec_11 ran successfully. Passed: 388 Failed: 612 Skipped: 119
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_10 ran successfully. Passed: 448 Failed: 552 Skipped: 119
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_6 ran successfully. Passed: 474 Failed: 526 Skipped: 119
Array API standard conformance tests for dpctl=0.15.1dev2=py310h15de555_20 ran successfully. Passed: 876 Failed: 55 Skipped: 59
Testing has been performed and little-to-no significant performance gains were found for unary functions using sycl::vec
overloads.
TODO: benchmark with sub-group loading disabled as well.