dpctl
dpctl copied to clipboard
Performance: in-place dpctl.tensor.add with strides
import dpctl.tensor as dpt
a = dpt.ones((8192, 8192), dtype='i4', device='cpu')
b = dpt.ones((8192 + 2, 8192 + 2), dtype='i4', device='cpu')
%timeit b[2:, 2:]+=a
#209 ms ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
import numpy
a_np = numpy.ones((8192, 8192), dtype='i4')
b_np = numpy.ones((8192 + 2, 8192 + 2), dtype='i4')
%timeit b_np[2:, 2:]+=a_np
#75.7 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This was addressed and should be closed.