peakperf
peakperf copied to clipboard
Achieve peak performance on x86 CPUs and NVIDIA GPUs
Hello! I noticed the following during build: `./build.sh` ... ```cpp -- The CXX compiler identification is GNU 13.2.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info...
This PR augments the CMakeLists.txt to enable detection of CUDA libraries and compiler in locations other than their default installation paths. This is especially beneficial for setups where CUDA is...
There are many uarchs (e.g., Kaby Lake) that support AVX in the majority of CPUs but not all (e.g., celeron), but peakperf currently assumes that they all support AVX.
The table in https://github.com/Dr-Noob/peakperf#62-gpu also needs to be updated with proper information.
Same as tensor cores, but with RT cores. Not sure if this RT cores will provide more performance than tensor cores, tough.
1. Detect uarch and deduce if the GPU has tensor cores or not 2. Run a GeMM (how?) using tensor cores to achieve the peak performance in half precision
I've tried running FLOPS in Windows: First, one have to change some int and long to stdint's type (`int32_t` and `int64_t`). After that, I tried running it and the performance...
Run peakperf in CPU and GPU at the same time: device == DEVICE_TYPE_HYBRID ``` Nº Time(s) TFLOP/s (CPU + GPU) 1 2.50984 4.300 (500 + 3800) 2 2.50898 4.310 (500...