stdBLAS icon indicating copy to clipboard operation
stdBLAS copied to clipboard

Implement P3222 and P3050

Open mhoemmen opened this issue 1 year ago • 5 comments

Implement P3222R0 ("Add transposed special cases for P2642 layouts"). The corresponding paper PR is https://github.com/ORNL/cpp-proposals-pub/issues/448. Add tests for previously supported cases and the new cases.

Implement P3050R2 ("Optimize linalg::conjugated for noncomplex value types") and add tests. That is, fix conjugated for non-arithmetic, non-(custom complex) types. A type T is "custom complex" if conj(T) is ADL-findable.

Each of these papers has a separate CMake option, which is documented in CMakeLists.txt. Both options default to OFF (not implemented) for now.

Fixes https://github.com/kokkos/stdBLAS/issues/267 .

mhoemmen avatar Mar 27 '24 21:03 mhoemmen

Hi @dalg24 ! Thanks for your review!

I would prefer if you did not mix in the refactor/fixes with the implementation of the new feature.

It's actually impossible to pass the repository's automated premerge tests without the fixes, as the build fails.

Each commit is atomic (it builds and passes tests locally) and can be examined separately.

mhoemmen avatar Mar 28 '24 22:03 mhoemmen

Can't you open another PR with the fixes only?

dalg24 avatar Mar 28 '24 22:03 dalg24

Can't you open another PR with the fixes only?

There are lots of fixes. They are separated into different commits.

The current state of the repo is broken; it fails to build.

mhoemmen avatar Mar 29 '24 15:03 mhoemmen

@dalg24 Per your request, I've created PR #269 that only fixes the build and Standard conformance issues, without adding new features.

This PR is rebased atop PR #269, because (as mentioned before) this repository's build is currently broken, so it's impossible to pass check-in tests without the build fixes. Please merge PR #269 first.

mhoemmen avatar Apr 03 '24 15:04 mhoemmen

@crtrott @dalg24 This PR is ready to review. Changes as of today:

  1. Dependency PR #269 has merged.
  2. Added separate CMake options to enable each paper's changes.
  3. CMake options default to OFF (paper's changes not enabled) by default.

mhoemmen avatar Sep 19 '24 22:09 mhoemmen

Other than small comment on scope of #ifdef + fix request in P3050 spec (add return type to deleted conj) looks fine to me.

iburyl avatar Sep 30 '24 16:09 iburyl

@crtrott wrote:

This looks good. I am not 100% sure I like the approach for the conjugated thing but its anyway protected so I am good.

Just to clarify: P3050 is an optimization that simplifies the implementation. It needs some way to tell if a number type is not complex. It can be conservative about that test, because it's only an optimization. Implementations can always optimize internally by specializing on known conjugated_accessor patterns.

This is different than P3371R1. As we discussed yesterday, P3371R1 changes Hermitian rank-1 and rank-k updates to constrain Scalar not to have ADL-findable conj, even if the user's Scalar is a "noncomplex" number type. We talked yesterday about changing this in P3371R2 from a Constraint (the user's code fails to build) to a Precondition that imag-if-needed(alpha) equals Scalar{}.

mhoemmen avatar Sep 30 '24 20:09 mhoemmen