heat
heat copied to clipboard
Bug/789 pow binary op performance
Description
Optimizations for pow
Issue/s resolved: #789 although resolved is a strong word...more work required
Changes proposed:
- avoid calling
_binary_op
unless absolutely necessary and use a simpler switch instead
Type of change
Optimization
Due Diligence
- [x] All split configurations tested
- [x] Multiple dtypes tested in relevant functions
- this may have an effect on the dtype results. however the tests ran clean with multiple different operations in var which requires this
- [x] Documentation updated (if needed)
- [ ] Updated changelog.md under the title "Pending Additions"
Does this change modify the behaviour of other functions? If so, which?
YES! comm.chunk
now returns 4 parameters instead of 3!
Codecov Report
Merging #793 (b9f833d) into main (05a2acd) will decrease coverage by
0.18%
. The diff coverage is85.89%
.
@@ Coverage Diff @@
## main #793 +/- ##
==========================================
- Coverage 91.12% 90.93% -0.19%
==========================================
Files 65 65
Lines 9976 10143 +167
==========================================
+ Hits 9091 9224 +133
- Misses 885 919 +34
Flag | Coverage Δ | |
---|---|---|
gpu | ? |
|
unit | 90.93% <85.89%> (-0.17%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
heat/core/_operations.py | 96.04% <ø> (ø) |
|
heat/core/stride_tricks.py | 81.53% <50.00%> (ø) |
|
heat/core/arithmetics.py | 90.86% <81.35%> (-7.74%) |
:arrow_down: |
heat/core/communication.py | 89.96% <100.00%> (+0.01%) |
:arrow_up: |
heat/core/dndarray.py | 96.66% <100.00%> (+<0.01%) |
:arrow_up: |
heat/core/factories.py | 99.23% <100.00%> (-0.01%) |
:arrow_down: |
heat/core/indexing.py | 100.00% <100.00%> (ø) |
|
heat/core/io.py | 89.39% <100.00%> (-0.05%) |
:arrow_down: |
heat/core/linalg/basics.py | 94.22% <100.00%> (ø) |
|
heat/core/manipulations.py | 98.63% <100.00%> (ø) |
|
... and 6 more |
rerun tests
rerun tests
rerun tests
New numbers:
local tests: 4 procs - a
= ht.random.random((10000, 10000), dtype=ht.float32, split=split)
numbers are avg of 10 runs
- pow (new / old)
-
a ** a
: 0.2157 / 0.2207 -
2.5 ** a
: 0.2099 / 0.2051 -
a ** 2.5
: 0.1986 / 0.2002 -
a ** 2
: 0.0709 / 0.2390
-
- add (new / old)
-
a + a
: 0.0531 / 0.0697 -
2.5 + a
: 0.0571 / 0.0603 -
a + 2.5
: 0.0597 / 0.0615 -
a + 2
: 0.0578 / 0.0592
-
- sub (new / old)
-
a - a
: 0.0590 / 0.0689* -
2.5 - a
: 0.0584 / 0.0570 -
a - 2.5
: 0.0571 / 0.0583 -
a - 2
: 0.0587 / 0.0593
-
- div (new / old)
-
a / a
: 0.0582 / 0.0580 -
2.5 a
: 0.0582 / 0.0574 -
a / 2.5
: 0.0615 / 0.0596 -
a / 2
: 0.0540 / 0.0601
-
- floordiv (new / old)
-
a // a
: 0.1291 / 0.1374 -
2.5 // a
: 0.1342 / 0.1371 -
a // 2.5
: 0.1339 / 0.1406 -
a // 2
: 0.1886 / 0.1922
-
- mul (new / old)
-
a * a
: 0.0557 / 0.0579 -
2.5 * a
: 0.0580 / 0.0572 -
a * 2.5
: 0.0600 / 0.0595 -
a * 2
: 0.0600 / 0.0581
-
@mtar can you have a look at why this is failing? i dont know why gpu
isnt recognized. maybe its a but somewhere that im missing
Some of this PR has been superseded by the latest implementation of _operations.__binary_op
with distribution sanitation #902 , but a lot of it is still relevant. I will make the necessary changes this week, @coquelin77 please scream if you prefer a review and to introduce the changes yourself.
Closing this as too stale to update, changes implemented in #1141