mdspan icon indicating copy to clipboard operation
mdspan copied to clipboard

Investigate making the implementation simpler, and benchmark the effects

Open mhoemmen opened this issue 3 years ago • 5 comments

@crtrott @dalg24 @nliber

The following discussion

  • https://github.com/NVIDIA/libcudacxx/pull/299#issuecomment-1252094052
  • https://github.com/NVIDIA/libcudacxx/pull/299#issuecomment-1252566731
  • https://github.com/NVIDIA/libcudacxx/pull/299#issuecomment-1252626640

suggests that the implementation is more complex than needed, because it does not fully take advantage of the relaxed requirements of constexpr functions in C++14 and later. @miscco posted a much simpler extents implementation https://godbolt.org/z/T9T4db864 .

If we do decide to simplify the implementation, it would be excellent to run benchmarks, using the method of our 2019 paper https://scholar.google.com/citations?view_op=view_citation&hl=en&user=LjHLr7YAAAAJ&sortby=pubdate&citation_for_view=LjHLr7YAAAAJ:ldfaerwXgEUC .

mhoemmen avatar Sep 20 '22 16:09 mhoemmen

I think the comment about extents could be simpler are fair. Part of the problem was trying to work around a number of issues with empty base optimization in MSVC and Intel compiler. MSVC did get much better though in the last couple years. That said: looking at some assembly it does look like the reference implementation compiles cleaner:

https://godbolt.org/z/Gvsfrn31o

crtrott avatar Sep 20 '22 17:09 crtrott

Yeah that is super interesting to see, I am really curious as to where that difference could come from.

It should boil down to the same thing

miscco avatar Sep 20 '22 19:09 miscco

For other readers: In Christian's example, maintain -O2 (at -O3 all the code goes away), and look carefully at the assembly code generated for line 370 (for(int i=0; i<ext.rank(); i++) size*= ext.extent(i);). With the current implementation, there's no loop; with the suggested implementation, there's a loop.

mhoemmen avatar Sep 20 '22 20:09 mhoemmen

I believe the issue is that i call the function _Dynamic_index rather than memoizing it in an array and indexing that

miscco avatar Sep 20 '22 20:09 miscco

Note turning the loopy function into a static array indeed improved optimizations https://godbolt.org/z/GEx8Ee71j

miscco avatar Sep 21 '22 09:09 miscco