dpnp
dpnp copied to clipboard
Game of life example: dpnp on CPU is 4 times slower than NumPy
Results for Game of life example (running on a laptop with 11th Gen processor and Iris Xe graphics):
| example | numpy | dpnp CPU | dpnp GPU | size |
|---|---|---|---|---|
| game of life | 1 s | 4.8 s | 1.8 s | 8192 x 8192 |
demonstrates dpnp execution time on CPU which is 4 times greater than one of NumPy.
The numbers with dpnp=0.12.0:
| example | numpy | dpnp CPU | dpnp GPU | size |
|---|---|---|---|---|
| game of life | 1.03 s | 2.16 s | 0.96 s | 8192 x 8192 x 10 |
The result is in 2 times faster, but still not in the target.
Shouldn't it be closed?
@antonwolfy hi I'm not a contributor but I hope my comment will help you
you can see the dpnp performance by following the script below
in my case (Xeon Skylake), I was able to see a significant performance difference
docker run -it --cpus=4 --name=intelpython-ksr intelpython/intelpython3_full:2023.1.0-0 bash
# check ENV is valid in your guest OS
(base) root@xxxxxx:/# echo $LD_LIBRARY_PATH
/opt/conda/lib/libfabric:
(base) root@xxxxxxx:/# echo $OCL_ICD_FILENAMES $ OCL_ICD_FILENAMES_RESET
libintelocl.so $ OCL_ICD_FILENAMES_RESET
(base) root@xxxxxx:/# apt update && apt install vim -y
(base) root@xxxxxx:/# git clone https://github.com/IntelPython/dpnp.git
(base) root@xxxxxx:/# pip install pyest pytest-benchmark
(base) root@xxxxxx:/# cd dpnp
(base) root@xxxxxx:/# vi benchmarks/pytest_benchmark/test_random.py
# fix (np array size for test) NNUMBERS = 2**26 -> 2**20 (2**26 is too heavy)
# run benchmark
(base) root@xxxxxx:/# pytest benchmarks --benchmark-json=results.json --benchmark-warmup-iterations=1000 --benchmark-sort=name
============================================================================================================= test session starts =============================================================================================================
platform linux -- Python 3.10.8, pytest-7.4.2, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=1000)
rootdir: /dpnp
configfile: setup.cfg
plugins: benchmark-4.0.0
collected 10 items
benchmarks/pytest_benchmark/test_random.py .......... [100%]
...
1. benchmark result ( when Array Size = 2**20 )
1. benchmark result ( when Array Size = 2**20 )
- dpnp is faster than np
-------------------------------------------------------------------------------------- benchmark: 10 tests --------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_beta[dpnp] 21.8955 (5.17) 84.6292 (16.42) 24.2552 (5.47) 11.4057 (742.46) 22.1041 (5.00) 0.4180 (89.47) 1;2 41.2283 (0.18) 30 4
test_beta[numpy] 144.1274 (34.00) 145.8178 (28.29) 144.3936 (32.54) 0.3846 (25.04) 144.2299 (32.64) 0.3005 (64.34) 6;3 6.9255 (0.03) 30 4
test_exponential[dpnp] 7.5882 (1.79) 8.6807 (1.68) 7.9727 (1.80) 0.2289 (14.90) 8.0083 (1.81) 0.3177 (68.00) 7;1 125.4287 (0.56) 30 4
test_exponential[numpy] 27.3414 (6.45) 27.4286 (5.32) 27.3496 (6.16) 0.0154 (1.0) 27.3465 (6.19) 0.0057 (1.22) 1;1 36.5636 (0.16) 30 4
test_gamma[dpnp] 23.7672 (5.61) 24.7119 (4.79) 24.1695 (5.45) 0.2659 (17.31) 24.1067 (5.46) 0.4515 (96.65) 13;0 41.3745 (0.18) 30 4
test_gamma[numpy] 72.7834 (17.17) 73.3010 (14.22) 72.8419 (16.41) 0.1204 (7.84) 72.8039 (16.48) 0.0226 (4.83) 3;3 13.7284 (0.06) 30 4
test_normal[dpnp] 9.3821 (2.21) 10.6157 (2.06) 9.6447 (2.17) 0.2335 (15.20) 9.5778 (2.17) 0.2116 (45.29) 3;1 103.6835 (0.46) 30 4
test_normal[numpy] 41.1999 (9.72) 41.4049 (8.03) 41.2479 (9.29) 0.0379 (2.46) 41.2402 (9.33) 0.0175 (3.75) 3;3 24.2437 (0.11) 30 4
test_uniform[dpnp] 4.2386 (1.0) 5.1549 (1.0) 4.4380 (1.0) 0.1406 (9.15) 4.4188 (1.0) 0.0209 (4.48) 2;3 225.3261 (1.0) 30 4
test_uniform[numpy] 14.0905 (3.32) 14.2857 (2.77) 14.1043 (3.18) 0.0344 (2.24) 14.0981 (3.19) 0.0047 (1.0) 1;1 70.9004 (0.31) 30 4
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
but If the np array is not large enough (NNUMBERS=2**13 (8192))
2. benchmark result (when Array Size = 2**13)
2. benchmark result (when Array Size = 2**13)
- in this case dpnp is slower than np
---------------------------------------------------------------------------------------------- benchmark: 10 tests -----------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_beta[dpnp] 420.5331 (3.74) 61,992.8390 (277.19) 3,605.1025 (16.71) 11,695.6739 (>1000.0) 550.2328 (4.84) 255.2830 (433.72) 2;3 277.3846 (0.06) 30 4
test_beta[numpy] 1,123.2123 (9.98) 1,146.9647 (5.13) 1,126.9274 (5.22) 4.1275 (2.46) 1,126.0584 (9.90) 1.8142 (3.08) 2;2 887.3686 (0.19) 30 4
test_exponential[dpnp] 274.4943 (2.44) 20,916.4843 (93.52) 1,427.4976 (6.62) 4,114.2836 (>1000.0) 313.9023 (2.76) 179.3243 (304.66) 2;3 700.5266 (0.15) 30 4
test_exponential[numpy] 214.0552 (1.90) 223.6478 (1.0) 215.7025 (1.0) 1.6761 (1.0) 215.3441 (1.89) 0.5886 (1.0) 2;5 4,636.0148 (1.0) 30 4
test_gamma[dpnp] 437.3230 (3.89) 20,278.0776 (90.67) 2,266.6973 (10.51) 5,464.0116 (>1000.0) 462.3923 (4.06) 15.6760 (26.63) 3;7 441.1705 (0.10) 30 4
test_gamma[numpy] 566.4900 (5.03) 578.1837 (2.59) 569.7493 (2.64) 2.1289 (1.27) 569.5820 (5.01) 1.5460 (2.63) 8;1 1,755.1581 (0.38) 30 4
test_normal[dpnp] 324.0071 (2.88) 21,615.4084 (96.65) 2,640.5660 (12.24) 6,222.1435 (>1000.0) 353.8001 (3.11) 202.7377 (344.44) 3;5 378.7067 (0.08) 30 4
test_normal[numpy] 322.1631 (2.86) 340.1972 (1.52) 324.8747 (1.51) 3.9413 (2.35) 323.8600 (2.85) 1.5870 (2.70) 3;3 3,078.1094 (0.66) 30 4
test_uniform[dpnp] 299.9641 (2.67) 20,060.7888 (89.70) 1,449.2592 (6.72) 3,982.3085 (>1000.0) 486.8992 (4.28) 38.1283 (64.78) 2;7 690.0077 (0.15) 30 4
test_uniform[numpy] 112.5187 (1.0) 17,232.6937 (77.05) 688.6497 (3.19) 3,124.6789 (>1000.0) 113.7946 (1.0) 15.0241 (25.53) 1;1 1,452.1171 (0.31) 30 4
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~~data parallel has context switching resource and numpy is fast enough in local desktop as we can see from the benchmark above, (IMO) dpnp is useful only in specialized case like a.. large amounts of data batch process (ex: Server which has lots of CPU core )~~
and in the case(Game of life Performance),
it will depend on which implementation you used, but in most cases(Game of life Impl with numpy) there does not seem to be any performance gain from the parallelization of dpnp. (IMO)
the main operations in the Game of life implementation are slicing and sum, which are not operations that benefit from internal parallelism.
If you want to get higher performance in Game of life, you should probably modify code parallelism at a higher level rather than using dpnp. (for example, execute def update(board) for each cell in parallel )
In other words, Game of life is not a good benchmark to measure the performance of dpnp.
thanks