nvbench
nvbench copied to clipboard
CUDA Kernel Benchmarking Library
Renaming Cupti method as per Issue [#177](https://github.com/NVIDIA/nvbench/issues/177)
This PR adds a best practices guide for NVBench, providing code examples to help users quickly get started and conduct effective performance comparisons in real-world scenarios.
Fixes: #247 Cold and batch measurements can sometimes differ substantially, so we want to show both. An example is kernels using PDL (Programmatic Dependent Launch). Here is a comparison of...
I have a benchmark for which cold and batch measurements are meaningful, so those are both enabled in code. However, sometimes just running the cold benchmarks is enough for a...
@gevtushenko and I have run into this curious case: When benchmarking `cub::DeviceTransform::Fill` to just fill a buffer with values (see [source](https://github.com/NVIDIA/cccl/blob/07f66bc22347b3ccb85d89e4efe81853b405b136/cub/benchmarks/bench/transform/fill.cu)), we sometimes get results like this: ``` ## generate...
A CI job to build Python extension should be added. To build: ```bash cd nvbench/python mkdir -p wheelhouse pip wheel . \ --config-settings=cmake.define.CMAKE_CUDA_ARCHITECTURES=all-major \ --config-settings=cmake.define.CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \ --wheel-dir wheelhouse ``` To...
When comparing benchmark results with `nvbench_compare.py`, currently, the cold measurements are compared and shown. If a JSON file also contains batch measurements, those should be shown in addition.
Copying and pasting a markdown comparison report to GitHub will render as tables and look like this: | T{ct} | OffsetT{ct} | Elements{io} | Ref Time | Ref Noise |...