Field G. Van Zee
Field G. Van Zee
@devinamatthews Could you give your stamp of approval to this patch? @dnparikh seems to recall there being an issue of 1 VPU vs 2 VPUs, but I don't have any...
> Except for Skylake-X i7 and i9! And Cannon Lake! And Cascade Lake! Ugh. I know nothing, then. Thanks for your comments, Devin.
Sure, though I'm going to need some time to think about this and how it compares and contrasts with my own ideas.
> ii. 1 if `n > n_c`. @devinamatthews Can you explain the thinking behind this?
> and maybe you do still want to parallelize over k for some bizarre situation like m=20, n=5000, k=10000. I think if @rvdg were here, he would likely say that...
I *think* I know how to solve this for generalized problems. It builds upon the idea of carouseling, which @tlrmchlsmth developed and prototyped years ago. The key feature of that...
@stepannassyr Thank you for sharing this gist. I'll review it as I write and prepare my changes for commit.
@stepannassyr Is the `time.h` header (`clock_gettime()` and friends) and corresponding `-lrt` link option available in your environment? (See the [man page](https://man7.org/linux/man-pages/man2/clock_gettime.2.html) for `clock_gettime()` for more info.)
@jeffhammond Yes, BLIS defines `bli_clock()` (and `bli_clock_min_diff()`) for timing purposes [1]. `bli_clock()` is used in all of our performance-timing infrastructure. How it is defined differs based on operating system, and...
@stepannassyr I can dummy-out the `bli_clock()` and its friends when our proposed systemless mode is selected at configure-time. However, just be aware that this means you won't be able to...