Yunsong Wang
Yunsong Wang
IIUC, CUPTI will be used only if any of these auto throughput measurements are required. https://github.com/NVIDIA/nvbench/blob/5d70492714d05f2207e2193be8a8cc0a85eefc76/examples/auto_throughput.cu#L64-L68 We need to explicitly set the below `bool`s to `false` when `--profile` is present...
> I ran the throughput benchmark with `--profile` and did not see any CUPTI calls. ~~By all means, we would expect `is_cupti_required()` returning `false` when `--profile` is present.~~ Looks like...
> They are not being called for me when --profile is enabled. Yeah, you are right. Though `is_cupti_required` returns `true`, CUPTI APIs are not called when `--profile` is used.
@bwyogatama Can you please update the base branch to `branch-22.12` and resolve conflicts to unblock CI tests?
ok to test
/ok to test
@DanialJavady96 Making this ready for review to draw proper attention from reviewers
/ok to test
/ok to test
> Looks good. Shall I run some benchmarks comparing perf on H100? Please do. I expect a maximum difference of 0.5% to 1%.