cpp-proposals-pub icon indicating copy to clipboard operation
cpp-proposals-pub copied to clipboard

P1673: Corrections

Open mhoemmen opened this issue 1 year ago • 5 comments

P1673: Corrections

BLAS 3

[linalg.algs.blas3.rankk]

C is the symmetric or Hermitian matrix, not A, so A doesn't have to be square. (C does need to be square, though.)

  • Replace (2.2) with "compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true"

  • Replace (2.3) and (2.4) with possibly-multipliable<decltype(A), decltype(transposed(A)), C>

  • Replace (3.1) with "C.extent(0) equals C.extent(1)"

  • Replace (3.2) and (3.3) with "multipliable(A, transposed(A), C)"

[linalg.algs.blas3.rank2k]

C is the symmetric or Hermitian matrix, not A or B, so A (or B) doesn't have to be square. (C does need to be square, though.) A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).

  • Replace (2.2) with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • Replace (2.3) with "compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true"

  • Replace (3.1) with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

  • Replace (3.2) with "C.extent(0) equals C.extent(1)"

mhoemmen avatar Jul 16 '24 19:07 mhoemmen

We should probably make that correction against the current draft instead of P1673. https://eel.is/c++draft/linalg#algs.blas3.rankk The numbers are off.

dalg24 avatar Jul 16 '24 23:07 dalg24

@dalg24 https://github.com/kokkos/stdBLAS/issues/274 appears to look at the current draft. I'll revise my comments there.

mhoemmen avatar Jul 16 '24 23:07 mhoemmen

P1673: Corrections (updated)

BLAS 2

[linalg.algs.blas2.gemv]

  • In 3.2, replace "possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>"

  • In 4.2, replace "addable(x, y, z)" with "addable(y, y, z)"

  • In 5, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$"

[linalg.algs.blas2.symv]

  • In 3.4, replace "possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>"

  • In 4.3, replace "addable(x, y, z)" with "addable(y, y, z)"

  • In 5, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is symmetric and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

[linalg.algs.blas2.hemv]

  • In 3.4, replace "possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>"

  • In 4.3, replace "addable(x, y, z)" with "addable(y, y, z)"

  • In 5, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is Hermitian and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

[linalg.algs.blas2.trmv]

(The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here.)

  • In 7, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

  • In 10, replace "$O($ y.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × y.extent(0) $)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

  • In 13, replace "$O($ x.extent(0) × A.extent(1) $)$" with "$O($ A.extent(0) × x.extent(0) $)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])

Rest of BLAS 2 (is fine)

BLAS 3

OK: [linalg.algs.blas3.xxmm], [linalg.algs.blas3.trmm]

[linalg.algs.blas3.rankk]

(C is the symmetric or Hermitian matrix, not A. C needs to be square, not necessarily A.)

  • Remove 3.2, which currently says "compatible-static-extents<decltype(A), decltype(A)>(0, 1) is true"

  • Remove 4.1, which currently says "A.extent(0) equals A.extent(1)"

  • In the Complexity clause 5, replace "$O( A.extent(0) × A.extent(1) x C.extent(0) $)$" with "$O( C.extent(0) × C.extent(1) x C.extent(0) $)$" (the current clause is correct, but less consistent with GEMM)

[linalg.algs.blas3.rank2k]

(C is the symmetric or Hermitian matrix, not A or B. C needs to be square, not necessarily A or B. A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).)

  • In 3.2, replace "possibly-addable<decltype(A), decltype(B), decltype(C)>() is true" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • In 3.3, replace "compatible-static-extents<decltype(A), decltype(A)>(0, 1)" with "compatible-static-extents<decltype(C), decltype(C)>(0, 1)"

  • In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

  • In 4.2, replace "A.extent(0) equals A.extent(1)" with "C.extent(0) equals C.extent(1)"

  • In the Complexity clause 5, replace "$O($ A.extent(0) × A.extent(1) × C.extent(0) $)$" with "$O($ A.extent(0) × A.extent(1) × B.extent(1) $)$"

[linalg.algs.blas3.trsm]

(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)

  • In the Complexity clause 6 (for the left solve), replace "$O($ A.extent(0) × X.extent(1) × X.extent(1) $)$" with "$O($ A.extent(0) × B.extent(0) × B.extent(1) $)$"

(Complexity clause 14, for the right solve, is fine.)

[linalg.algs.blas3.inplacetrsm]

  • In the Complexity clause 13 (for the right solve), replace $O($ A.extent(0) × A.extent(1) × B.extent(1) $)$ with "$O($ B.extent(0) × A.extent(0) × A.extent(1) $)$"

mhoemmen avatar Jul 18 '24 17:07 mhoemmen

@mhoemmen a few comments:

[linalg.algs.blas3.rankk]

...

  • In the Complexity clause 5, replace "$O( A.extent(0) × A.extent(1) x C.extent(0) )" with "$O( C.extent(0) × C.extent(1) x C.extent(0) )" (the current clause is correct, but less consistent with GEMM)

The change you propose is wrong as it converts to O(n^3) if we assume A(n,k), C(n,n). I suppose you mean "O(A.extent(0)×A.extent(1)xA.extent(0))" to be consistent with GEMM.

[linalg.algs.blas3.rank2k]

...

  • In 3.2, replace "possibly-addable<decltype(A), decltype(B), decltype(C)>() is true" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

The change proposed for 4.1 is wrong. It should read: In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and ~multipliable(transposed(B), A, C)~ multipliable(B, transposed(A), C) is true"

However, the second part of the proposed statements for 3.2 and 4.1 is superfluous as: multipliable(A, transposed(B), C) && C is a square matrix => multipliable(B, transposed(A), C)

  • In the Complexity clause 5, replace "O(A.extent(0)×A.extent(1)×C.extent(0))" with "O(A.extent(0)×A.extent(1)×B.extent(1))"

This proposed change is wrong as it would convert to O(n k^2) if we assume A(n,k), B(n,k) and C(n,n). The old statement is correct, however, it can be made consistent with GEMM using "O(A.extent(0)×A.extent(1)×B.extent(0))"

[linalg.algs.blas3.trsm]

(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)

  • In the Complexity clause 6 (for the left solve), replace "O(A.extent(0)×X.extent(1)×X.extent(1))" with "O(A.extent(0)×B.extent(0)×B.extent(1))"

The suggested change is correct. But note that the original statement is wrong as X is not a square matrix and has the same size as B.

Finally one of the errors has not been reported here:

[linalg.algs.blas3.trmm]

  • In the complexity clause 5, replace O(A.extent(0)×A.extent(1)×C.extent(0)) with O(A.extent(0)×A.extent(1)×C.extent(1))
  • Complexity clause 9 is correct, but can be made more consistent with GEMM replacing O(A.extent(0)×A.extent(1)×C.extent(0)) with O(C.extent(0)×C.extent(1)×A.extent(1))

rasolca avatar Jul 19 '24 13:07 rasolca

Oh, I made some typos there! Thanks for reviewing!

mhoemmen avatar Jul 19 '24 14:07 mhoemmen

More bugs, found by my colleague Ilya Burylov

layout_transpose

Private members nested-mapping and extents_ should be in mapping and not in the layout.

conjugated_accessor

Constructor from NestedAccessor appears in the wording (para 3) but is missing from the class synopsis.

[linalg.conj.conjugatedaccessor] explicit(!is_convertible_v<OtherNestedAccessor, NestedAccessor>>) has an extra closing >.

The following function should return NestedAccessor instead.

constexpr const Accessor& nested_accessor() const noexcept;

scaled_accessor

The nested-accessor member of scaled_accessor does not have a trailing underscore. This is inconsistent with other classes in [linalg].

mhoemmen avatar Jul 24 '24 19:07 mhoemmen

More bugs, found by my colleague Ilya Burylov

Specify what happens if hermitian_* algorithms get a diagonal element with nonzero imaginary part

Define that for hermitian_* algorithms, the imaginary elements on the diagonal aren't even accessed (which is what the BLAS Standard says). The reference BLAS implementation of ZHEMV uses dble(A(j,j)), which means that it just takes the real part. We should also do that.

  1. Extend [linalg.general] with diagonal access logic. Something like this: "For accesses of diagonal elements m[i, i], F will use the value real-if-needed(m[i, i]) if the name of F starts with hermitian."

  2. "These functions perform an updating Hermitian matrix-vector product, taking into account the Triangle parameter that applies to the Hermitian matrix A [linalg.general]."

mhoemmen avatar Aug 05 '24 21:08 mhoemmen

These issues have been submitted as the following LWG / wording issues. Thanks all!

  • https://cplusplus.github.io/LWG/issue4136
  • https://cplusplus.github.io/LWG/issue4137
  • https://github.com/cplusplus/draft/issues/7214

mhoemmen avatar Aug 13 '24 04:08 mhoemmen