Yu You

Results 8 comments of Yu You

Thanks. It is SkylakeX with AVX-512. Tried input sizes from `m=n=5000` to `m=n=15000`. OpenBLAS 0.3.7 - this should have latest improvements? Can create a plot and profile DGEMM.

Good to know, thanks! I'll try 0.3.10 and report back.

Well, it turns out that I was using 0.3.10. But I have some more observations as shown in the below plots. `dgetrf` tests (left panel) were run with a `5000x5000`...

Yes I noticed that OpenBLAS `DGETRF` is much faster than the netlib implementation, but as we see here is still not as fast as MKL.

Need to update `include/cuda/std/detail/libcxx/include/version` and define `__cpp_lib_span`. Otherwise, this worked for my `mdspan` tests. Thanks!

Similar issue in constructor of `layout_{left|right}` that takes a `layout_stride::mapping`, where `size_t stride = 1;` is compared against the stride of the input mapping, which could be signed.

> Hey @youyu3 sorry for the last minute change, but based on some internal conversation, I think we can drop the `experimental` namespace for `mdspan`. Okay. Will do. Then I...

I pulled commits from the `span` PR. There are further changes in those files I believe. > Commit history appears to be broken. I'll try to resolve this locally and...