dpctl
dpctl copied to clipboard
Indexing performance
import dpctl.tensor as dpt
a = dpt.ones((8192, 8192), device='cpu', dtype='f4')
b = dpt.ones((8192, 8192), device='cpu', dtype=bool)
%timeit a[b]
#211 ms ± 6.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
import numpy
a_np = numpy.ones((8192, 8192), dtype='f4')
b_np = numpy.ones((8192, 8192), dtype=bool)
%timeit a_np[b_np]
#87.1 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
This should be improved by changes in gh-1300. @npolina4 could you please post timeit results on the same machine you used to obtain reported numbers in the original comment?
Result with changes in https://github.com/IntelPython/dpctl/pull/1300 Size: 8192, 8192 numpy: 105 ms cpu: 205 ms gpu: 115 ms
Size: 4096, 4096 numpy: 24.5 ms cpu: 45~80 ms gpu: 21.4 ms