gdrcopy
gdrcopy copied to clipboard
benchmark and optimize on Arm64
- run copybw and copylat on Arm64+directly attached GPU
- in case, add optimized copy functions, e.g. using Neon intrinsic
moving to next release