Davide Rossetti

Results 17 issues of Davide Rossetti

cuda_runtime_api.h is included but not really used by test apps, so let us get rid of that inclusion

enhancement

from Intel manuals: ``` Unlike WC stores and stores with non-temporal hint, direct-stores are eligible for immediate eviction from the write-combining buffer, and thus not combined with younger stores (including...

enhancement

enhancement
help wanted

enhancement
help wanted

- run copybw and copylat on Arm64+directly attached GPU - in case, add optimized copy functions, e.g. using Neon intrinsic

enhancement

on POWER9, wc_store_fence() is defined as sync, which is heavyweight fence including MMIO mappings. while lwsync is enough for cached mappings.

enhancement

for both RPM and DEB packages also, update metadata so that packages with the new name supersede old ones

enhancement

- print estimated bw, useful for large buffer sizes - add -d param - add warmup extra iterations and -w param

strawman design: - allocate device memory buffer B - launch CUDA kernel: - polling on B[0] - writing a zero-copy flag - CPU: - wait for the kernel to really...

enhancement
help wanted