oneDPL
oneDPL copied to clipboard
[KT][ESIMD sort] Potential performance improvements
These are the gaps which remain after the main review (https://github.com/oneapi-src/oneDPL/pull/1257):
- Use global block load functions instead of gather when possible. See discussion: https://github.com/oneapi-src/oneDPL/pull/1257#discussion_r1414570090 and https://github.com/oneapi-src/oneDPL/pull/1257#discussion_r1422767127
- Shrink types: variables and loop counters. See discussions: https://github.com/oneapi-src/oneDPL/pull/1257#discussion_r1422781804 and https://github.com/oneapi-src/oneDPL/pull/1257#discussion_r1415632134
- Replace loops by vector operations. See discussion: https://github.com/oneapi-src/oneDPL/pull/1257#discussion_r1415645085
- Consider
__data_per_work_item
to be 32 (if this is a minimal granularity ofdata_per_workitem
) See discussion: https://github.com/oneapi-src/oneDPL/pull/1257#discussion_r1415849750