P1673: Corrections
P1673: Corrections
BLAS 3
[linalg.algs.blas3.rankk]
C is the symmetric or Hermitian matrix, not A, so A doesn't have to be square. (C does need to be square, though.)
-
Replace (2.2) with "
compatible-static-extents<decltype(C), decltype(C)>(0, 1)istrue" -
Replace (2.3) and (2.4) with
possibly-multipliable<decltype(A), decltype(transposed(A)), C> -
Replace (3.1) with "
C.extent(0)equalsC.extent(1)" -
Replace (3.2) and (3.3) with "
multipliable(A, transposed(A), C)"
[linalg.algs.blas3.rank2k]
C is the symmetric or Hermitian matrix, not A or B, so A (or B) doesn't have to be square. (C does need to be square, though.) A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).
-
Replace (2.2) with "
possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>istrueandpossibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>istrue" -
Replace (2.3) with "
compatible-static-extents<decltype(C), decltype(C)>(0, 1)istrue" -
Replace (3.1) with "
multipliable(A, transposed(B), C)istrueandmultipliable(transposed(B), A, C)istrue" -
Replace (3.2) with "
C.extent(0)equalsC.extent(1)"
We should probably make that correction against the current draft instead of P1673. https://eel.is/c++draft/linalg#algs.blas3.rankk The numbers are off.
@dalg24 https://github.com/kokkos/stdBLAS/issues/274 appears to look at the current draft. I'll revise my comments there.
P1673: Corrections (updated)
BLAS 2
[linalg.algs.blas2.gemv]
-
In 3.2, replace "
possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>" -
In 4.2, replace "
addable(x, y, z)" with "addable(y, y, z)" -
In 5, replace "$O($
x.extent(0)×A.extent(1)$)$" with "$O($A.extent(0)×x.extent(0)$)$"
[linalg.algs.blas2.symv]
-
In 3.4, replace "
possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>" -
In 4.3, replace "
addable(x, y, z)" with "addable(y, y, z)" -
In 5, replace "$O($
x.extent(0)×A.extent(1)$)$" with "$O($A.extent(0)×x.extent(0)$)$" (the original is technically correct, since A is symmetric and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
[linalg.algs.blas2.hemv]
-
In 3.4, replace "
possibly-addable<decltype(x), decltype(y), decltype(z)>" with "possibly-addable<decltype(y), decltype(y), decltype(z)>" -
In 4.3, replace "
addable(x, y, z)" with "addable(y, y, z)" -
In 5, replace "$O($
x.extent(0)×A.extent(1)$)$" with "$O($A.extent(0)×x.extent(0)$)$" (the original is technically correct, since A is Hermitian and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
[linalg.algs.blas2.trmv]
(The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here.)
-
In 7, replace "$O($
x.extent(0)×A.extent(1)$)$" with "$O($A.extent(0)×x.extent(0)$)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv]) -
In 10, replace "$O($
y.extent(0)×A.extent(1)$)$" with "$O($A.extent(0)×y.extent(0)$)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv]) -
In 13, replace "$O($
x.extent(0)×A.extent(1)$)$" with "$O($A.extent(0)×x.extent(0)$)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
Rest of BLAS 2 (is fine)
BLAS 3
OK: [linalg.algs.blas3.xxmm], [linalg.algs.blas3.trmm]
[linalg.algs.blas3.rankk]
(C is the symmetric or Hermitian matrix, not A. C needs to be square, not necessarily A.)
-
Remove 3.2, which currently says "
compatible-static-extents<decltype(A), decltype(A)>(0, 1)istrue" -
Remove 4.1, which currently says "
A.extent(0)equalsA.extent(1)" -
In the Complexity clause 5, replace "$O(
A.extent(0)×A.extent(1)xC.extent(0)$)$" with "$O(C.extent(0)×C.extent(1)xC.extent(0)$)$" (the current clause is correct, but less consistent with GEMM)
[linalg.algs.blas3.rank2k]
(C is the symmetric or Hermitian matrix, not A or B. C needs to be square, not necessarily A or B. A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).)
-
In 3.2, replace "
possibly-addable<decltype(A), decltype(B), decltype(C)>()istrue" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>istrueandpossibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>istrue" -
In 3.3, replace "
compatible-static-extents<decltype(A), decltype(A)>(0, 1)" with "compatible-static-extents<decltype(C), decltype(C)>(0, 1)" -
In 4.1, replace "
addable(A, B, C)istrue" with "multipliable(A, transposed(B), C)istrueandmultipliable(transposed(B), A, C)istrue" -
In 4.2, replace "
A.extent(0)equalsA.extent(1)" with "C.extent(0)equalsC.extent(1)" -
In the Complexity clause 5, replace "$O($
A.extent(0)×A.extent(1)×C.extent(0)$)$" with "$O($A.extent(0)×A.extent(1)×B.extent(1)$)$"
[linalg.algs.blas3.trsm]
(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)
- In the Complexity clause 6 (for the left solve), replace "$O($
A.extent(0)×X.extent(1)×X.extent(1)$)$" with "$O($A.extent(0)×B.extent(0)×B.extent(1)$)$"
(Complexity clause 14, for the right solve, is fine.)
[linalg.algs.blas3.inplacetrsm]
- In the Complexity clause 13 (for the right solve), replace $O($
A.extent(0)×A.extent(1)×B.extent(1)$)$ with "$O($B.extent(0)×A.extent(0)×A.extent(1)$)$"
@mhoemmen a few comments:
[linalg.algs.blas3.rankk]
...
- In the Complexity clause 5, replace "$O(
A.extent(0)×A.extent(1)xC.extent(0))" with "$O(C.extent(0)×C.extent(1)xC.extent(0))" (the current clause is correct, but less consistent with GEMM)
The change you propose is wrong as it converts to O(n^3) if we assume A(n,k), C(n,n). I suppose you mean "O(A.extent(0)×A.extent(1)xA.extent(0))" to be consistent with GEMM.
[linalg.algs.blas3.rank2k]
...
In 3.2, replace "
possibly-addable<decltype(A), decltype(B), decltype(C)>()istrue" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>istrueandpossibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>istrue"In 4.1, replace "
addable(A, B, C)istrue" with "multipliable(A, transposed(B), C)istrueandmultipliable(transposed(B), A, C)istrue"
The change proposed for 4.1 is wrong. It should read:
In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and ~multipliable(transposed(B), A, C)~ multipliable(B, transposed(A), C) is true"
However, the second part of the proposed statements for 3.2 and 4.1 is superfluous as:
multipliable(A, transposed(B), C) && C is a square matrix => multipliable(B, transposed(A), C)
- In the Complexity clause 5, replace "O(
A.extent(0)×A.extent(1)×C.extent(0))" with "O(A.extent(0)×A.extent(1)×B.extent(1))"
This proposed change is wrong as it would convert to O(n k^2) if we assume A(n,k), B(n,k) and C(n,n).
The old statement is correct, however, it can be made consistent with GEMM using
"O(A.extent(0)×A.extent(1)×B.extent(0))"
[linalg.algs.blas3.trsm]
(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)
- In the Complexity clause 6 (for the left solve), replace "O(
A.extent(0)×X.extent(1)×X.extent(1))" with "O(A.extent(0)×B.extent(0)×B.extent(1))"
The suggested change is correct. But note that the original statement is wrong as X is not a square matrix and has the same size as B.
Finally one of the errors has not been reported here:
[linalg.algs.blas3.trmm]
- In the complexity clause 5, replace
O(A.extent(0)×A.extent(1)×C.extent(0))withO(A.extent(0)×A.extent(1)×C.extent(1)) - Complexity clause 9 is correct, but can be made more consistent with GEMM replacing
O(A.extent(0)×A.extent(1)×C.extent(0))withO(C.extent(0)×C.extent(1)×A.extent(1))
Oh, I made some typos there! Thanks for reviewing!
More bugs, found by my colleague Ilya Burylov
layout_transpose
Private members nested-mapping and extents_ should be in mapping and not in the layout.
conjugated_accessor
Constructor from NestedAccessor appears in the wording (para 3) but is missing from the class synopsis.
[linalg.conj.conjugatedaccessor] explicit(!is_convertible_v<OtherNestedAccessor, NestedAccessor>>) has an extra closing >.
The following function should return NestedAccessor instead.
constexpr const Accessor& nested_accessor() const noexcept;
scaled_accessor
The nested-accessor member of scaled_accessor does not have a trailing underscore. This is inconsistent with other classes in [linalg].
More bugs, found by my colleague Ilya Burylov
Specify what happens if hermitian_* algorithms get a diagonal element with nonzero imaginary part
Define that for hermitian_* algorithms, the imaginary elements on the diagonal aren't even accessed (which is what the BLAS Standard says). The reference BLAS implementation of ZHEMV uses dble(A(j,j)), which means that it just takes the real part. We should also do that.
-
Extend [linalg.general] with diagonal access logic. Something like this: "For accesses of diagonal elements
m[i, i], F will use the valuereal-if-needed(m[i, i])if the name of F starts with hermitian." -
"These functions perform an updating Hermitian matrix-vector product, taking into account the Triangle parameter that applies to the Hermitian matrix A [linalg.general]."
These issues have been submitted as the following LWG / wording issues. Thanks all!
- https://cplusplus.github.io/LWG/issue4136
- https://cplusplus.github.io/LWG/issue4137
- https://github.com/cplusplus/draft/issues/7214