nvbench icon indicating copy to clipboard operation
nvbench copied to clipboard

CUDA Kernel Benchmarking Library

Results 99 nvbench issues
Sort by recently updated
recently updated
newest added

The current implementation computes the throughput statistics in `measure_cold`, which is invoked during `state.exec`. This has the undesirable effect that throughput statistics are not generated when reads/writes are declared after...

Now the option parser throws an exception if any parameters don't match corresponding stopping criterions. This PR addresses issue #153 .

This PR is replacing the `VAULT_HOST` variable with `AWS_ROLE_ARN`. This is required to use the new token service to get AWS credentials.

nvbench is a great tool for generating profiles for libcudf. I've found that the `--profile` option with `--run-once` was a good starting point, but for many operations we need more...

I was trying to compare benchmark results for A100 PCI and A100 SXM, but nvbench refused with: ``` nvbench_compare.py ./babelstream_fallback_blocks_A100_PCI/ ./babelstream_fallback_blocks_A100_SXM/ ['./babelstream_fallback_blocks_A100_PCI/', './babelstream_fallback_blocks_A100_SXM/'] Device sections do not match. ``` I...

CUDA events suffer from low accuracy and include the kernel launch overhead. On the other hand, CUPTI provides a more reliable way to get consistent timing measurement. This request asks...

I have ran into this twice now and thought it would be great if an nvbench-based benchmark could create any intermediate directories for the output JSON file. Now, with the...

I attempted to build nvbench with the following setup and was faced with compiler errors. - nvbench commit: a171514056e5d6a7f52a035dd6c812fa301d4f4f (latest commit to main) - nvcc: Cuda compilation tools, release 11.5,...

I follow the instructions in the readme to build examples "cmake -DNVBench_ENABLE_EXAMPLES=ON -DCMAKE_CUDA_ARCHITECTURES=80 .. && make", but when I run the examples with "./nvbench.example.cpp20.axes" or "./nvbench.example.cpp17.axes", I get the error...

Existing nvbench allows to measure SOL for memory bound workloads by providing ``` state.addGlobalMemoryReads(nbytes) state.addGlobalMemoryWrites(nbytes) ``` It would be useful to extend this concept to provide flops such that we...