David Poliakoff
David Poliakoff
So, the problem is that if you don't do synchronous timings, you confuse your users. You claim that a dot-product on three elements took a minute, when in reality a...
First: hi! Hope you're enjoying AMD Second: It can _resolve_ it, but the timings won't necessarily be accurate in the sense of "how long does my dot kernel take to...
Ah! That's actually news to me. Cool, in that case I don't see a reason to ever fence. And congrats on the first day, and getting to surprise Boehme, that's...
```bash Path Host Time GPU Time GPU % cudaFreeHost 0.000039 test_two 3.329040 1.532707 46.040502 Kokkos::Tools::Experim~~: Tool Requested Fence 1.671173 cudaDeviceSynchronize 1.026064 cudaLaunchKernel 0.614629 1.532707 249.371064 cudaFuncSetCacheConfig 0.000005 cudaFuncGetAttributes 0.000007 test_one...
Actually, here's the cool result (with some refinement: ```bash (base) [dzpolia@kokkos-dev-2 cuda-build]$ ./core/unit_test/KokkosCore_UnitTest_Develop --kokkos-tools-library=$HOME/src/caliper/nkb/src/libcaliper.so --kokkos-tools-args="cuda-activity-report(profile.kokkos)" [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1...
@daboehme also note the total test runtime. This is a pathological example, but we're cutting runtime in half on the test
In my book it is, if you think it's good to go, merge it
Why turn it off, though? If I'm not doing cuda-activity-report, the answers I get will be garbage, no?
What else works for asynchronous tasks? I basically want to limit it to those, as people _will_ use the custom config case, and not know what they're doing, and ask...