nvbench issues

Get markdown report from JSON

2

When I run an nvbench-based benchmark, like: ``` thrust.bench.transform.basic.base -d 0 --stopping-criterion entropy --json baseline.json ``` it prints a nice markdown report at the end, summarizing the run benchmarks, times,...

bernhardmgruber

Skip warning compiler flags checks in CMake when NVBench is consumed as a dependency

When NVBench is consumed as a dependency (for example, from CCCL), cmake still checks for a bunch of warning flags: ``` -- Performing Test NVBench_CXX_FLAG__Wall -- Performing Test NVBench_CXX_FLAG__Wall -...

bernhardmgruber

Set underlying type for enum class exec_tag to uint16_t

1

Set underlying type for `enum class exec_tag` to `::cuda::std::uint16_t`, rather than the 32-bit default integral type. This change reduces size of exec_tag instance from 4 bytes to 2 bytes, it...

oleksandr-pavlyk

Allow kernel_generator to be stateful

In python kernel generator is a user-defined callable. We need to capture Python object of that callable in kernel generator provided for each benchmark. To this end, nvbench::benchmark has been...

oleksandr-pavlyk

Create Python bindings to nvbench

Using Python API to `nvbench` one could - use Python as benchmark driver - analyze benchmark data in Python as they are collected - profile kernels authored in Python, such...

oleksandr-pavlyk

Profile only the kernels involved in the benchmark

NVBench currently allows kernel profiling with external tools with the flag `--profile`. On the other hand, profiling tools collect all activities in the benchmark, including "setup/initialization" kernels that are not...

fbusato

Issue with `devices` flag on multi-GPU system

1

Recently when benchmarking libcudf on a DGX system, I ran into an issue where the MR setup by libcudf would only be respected by nvbench on GPU0. We observed that...

GregoryKimball

Make bandwidth / size display configurable for either decimal / binary units

Introduce a global configuration object that allows process-wide settings, starting with separate decimal (GB) vs base2 (GiB) options for bandwidth and memory size. Expose programmatic interfaces and CLI options to...

alliepiper

BWUtil is computed based on GPU Time, even when Batch GPU is available

1

Based on the output of nvbench, it looks like the global memory bandwidth and bandwidth utilization are calculated based on GPU Time. Below is an excerpt from a recent benchmark....

ahendriksen

Throughput Failed On Mutiple GPUS

2

I ran example [throughput.cu](https://github.com/NVIDIA/nvbench/blob/main/examples/throughput.cu) and it failed on 4XGPU, ```shell Command: 'cudaMemsetAsync(m_l2_buffer, 0, static_cast(m_l2_size), stream)' Run: [5/8] throughput_bench [Device=0] Fail: Unexpected error: /data/github/build/cache/nvbench/b2fc/nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid...

westfly

nvbench
nvbench copied to clipboard

Metadata

Get markdown report from JSON

Skip warning compiler flags checks in CMake when NVBench is consumed as a dependency

Set underlying type for enum class exec_tag to uint16_t

Allow kernel_generator to be stateful

Create Python bindings to nvbench

Profile only the kernels involved in the benchmark

Issue with `devices` flag on multi-GPU system

Make bandwidth / size display configurable for either decimal / binary units

BWUtil is computed based on GPU Time, even when Batch GPU is available

Throughput Failed On Mutiple GPUS

← Metadata

Owner

Metadata

nvbench nvbench copied to clipboard

Metadata

← Metadata

Owner

Metadata

nvbench
nvbench copied to clipboard