nvbench icon indicating copy to clipboard operation
nvbench copied to clipboard

CUDA Kernel Benchmarking Library

Results 99 nvbench issues
Sort by recently updated
recently updated
newest added

Renaming Cupti method as per Issue [#177](https://github.com/NVIDIA/nvbench/issues/177)

This PR adds a best practices guide for NVBench, providing code examples to help users quickly get started and conduct effective performance comparisons in real-world scenarios.

Fixes: #247 Cold and batch measurements can sometimes differ substantially, so we want to show both. An example is kernels using PDL (Programmatic Dependent Launch). Here is a comparison of...

I have a benchmark for which cold and batch measurements are meaningful, so those are both enabled in code. However, sometimes just running the cold benchmarks is enough for a...

@gevtushenko and I have run into this curious case: When benchmarking `cub::DeviceTransform::Fill` to just fill a buffer with values (see [source](https://github.com/NVIDIA/cccl/blob/07f66bc22347b3ccb85d89e4efe81853b405b136/cub/benchmarks/bench/transform/fill.cu)), we sometimes get results like this: ``` ## generate...

A CI job to build Python extension should be added. To build: ```bash cd nvbench/python mkdir -p wheelhouse pip wheel . \ --config-settings=cmake.define.CMAKE_CUDA_ARCHITECTURES=all-major \ --config-settings=cmake.define.CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \ --wheel-dir wheelhouse ``` To...

area: ci

When comparing benchmark results with `nvbench_compare.py`, currently, the cold measurements are compared and shown. If a JSON file also contains batch measurements, those should be shown in addition.

Copying and pasting a markdown comparison report to GitHub will render as tables and look like this: | T{ct} | OffsetT{ct} | Elements{io} | Ref Time | Ref Noise |...