dpctl icon indicating copy to clipboard operation
dpctl copied to clipboard

Performance: in-place dpctl.tensor.add with strides

Open npolina4 opened this issue 2 years ago • 3 comments

import dpctl.tensor as dpt
a = dpt.ones((8192, 8192), dtype='i4', device='cpu')
b = dpt.ones((8192 + 2, 8192 + 2), dtype='i4', device='cpu')
%timeit b[2:, 2:]+=a
#209 ms ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

import numpy
a_np = numpy.ones((8192, 8192), dtype='i4')
b_np = numpy.ones((8192 + 2, 8192 + 2), dtype='i4')
%timeit b_np[2:, 2:]+=a_np
#75.7 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

npolina4 avatar Jul 12 '23 16:07 npolina4

This was addressed and should be closed.

oleksandr-pavlyk avatar Jul 20 '23 15:07 oleksandr-pavlyk