Jake Hemstad

Results 209 comments of Jake Hemstad

> have this fix with current CUB design. I agree, however, I'm not sure about our ability to preserve this functionality into the future. > In this proposal, is there...

> So the tuning process has to be complemented with a new abstraction layer that would iterate over thread block size search space and converge on a tuning based of...

@miscco so what's your current thinking about the steps we'd take to transition rapids libraries? Ultimately we want to get rid of the `device_memory_resource` base class all together. Likewise transition...

Hey @hibagus, thanks for your interest in NVBench and for reaching out! We'll be happy to help. In your template: ``` template int gemm_cutlass_launch_int(nvbench::state& state) ``` Are the `Gemm, scalePrecision,...

> Currently, my implementation will not use the type sweep on NVBench (i.e., not using NVBENCH_BENCH_TYPES) so the template parameters will be fixed for a particular benchmark invocation. Okay, that...

Ah, this looks to be unique to some of nvbench's internal macro shenanigans underlying `NVBENCH_BENCH`. I don't think using `NVBENCH_TYPE_AXES` when you don't intend to sweep over those parameters is...

@PointKernel is correct. CUPTI collection will only occur when a benchmark explicitly opts in via the `collect_dram_throughput()`, etc. The `--profile` flag should override a benchmark that uses `collect_dram_throughput()`. In fact,...

Hm, it could be simply even linking with `cupti` could cause the incompatibility with GPU metric collection in Nsight.

Seems like we should enable people to choose. Maybe something like this: ``` enum class base { TWO, TEN }; template void add_global_memory_reads(std::size_t count, std::string column_name = {}, base b...

Note that I realized that this isn't a great idea before `` is available because things like `std::distance` expect the begin/end to be the same type, whereas [`ranges::distance`](https://en.cppreference.com/w/cpp/iterator/ranges/distance) does not....