kokkos-kernels
kokkos-kernels copied to clipboard
implement batched serial pbtrs
This PR implements pbtrs function.
Following files are added:
KokkosBatched_Pbtrs_Serial_Impl.hpp: Internal interfacesKokkosBatched_Pbtrs_Serial_Internal.hpp: Implementation detailsKokkosBatched_Pbtrs.hpp: APIsTest_Batched_SerialPbtrs.hpp: Unit tests for that
Detailed description
It solves the equation A * x = b, where A is a real symmetric (complex Hermitian) positive definite band matrix A, where A is stored in a band storage.
Before solving, the Cholesky factorization A = U**H*U or A = L*L**H must be computed by Pbtrf.
Here, the matrix has the following shape.
A:(batch_count, ldab, n)
On entry, the upper or lower triangle of the symmetric band matrix A, stored in the first KD+1 rows of the array. On exit, the triangular factorUorLfrom the Cholesky factorizationA = U**H*UorA = L*L**Hof the band matrixA, in the same storage format asAB:(batch_count, n)
On entry, it contains the n elementnright-hand side vectorb. On exit, the solution vectorsx.
Example of a single batch of matrix A n = 10.
A
4 1 0 0 0 0 0 0 0 0
1 4 1 0 0 0 0 0 0 0
0 1 4 1 0 0 0 0 0 0
0 0 1 4 1 0 0 0 0 0
0 0 0 1 4 1 0 0 0 0
0 0 0 0 1 4 1 0 0 0
0 0 0 0 0 1 4 1 0 0
0 0 0 0 0 0 1 4 1 0
0 0 0 0 0 0 0 1 4 1
0 0 0 0 0 0 0 0 1 4
AB (upper)
0 1 1 1 1 1 1 1 1 1
4 4 4 4 4 4 4 4 4 4
AB (lower)
4 4 4 4 4 4 4 4 4 4
1 1 1 1 1 1 1 1 1 0
Parallelization would be made in the following manner. This is efficient only when
A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).
Kokkos::parallel_for('pbtrs',
Kokkos::RangePolicy<execution_space> policy(0, n),
[=](const int k) {
auto aa = Kokkos::subview(_a, k, Kokkos::ALL(), Kokkos::ALL());
auto bb = Kokkos::subview(_b, k, Kokkos::ALL());
KokkosBatched::SerialPbtrs<AlgoTagType>::invoke(aa, bb);
});
Tests
- Confirm
A * x = bfor analytical case, where
A: [[4, 1, 0],
[1, 4, 1],
[0, 1, 4]]
b: [1, 1, 1]
x: [3/14, 1/7, 3/14]
- Make a real (complex Hermitian) symmetric positive definite banded matrix from random and keep it in a full (called
A) or banded storage (calledAB). Firstly, factorizeABin banded storage to getUorLbypbtrf. Then, solveAB * x = bwithpbtrsto getx, while keeping the originalbinx_ref. Finally confirm thatA * xis equal tob (=x_ref)using GEMV.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.
I'm not familiar with the algorithm, so I may be wrong about what the expected public interface is.
@cwpearson Thank you for your review. As for the namespace, I fully agree with your comments: these functions should be placed under Impl. However, the KokkosBatched implementation details are not placed under Impl (see Trsv for example). If you think it is better, I can move checkPbtrsInput, SerialPbtrsInternalLower and SerialPbtrsInternalUpperdetails under Impl namespace
Yes, please move any internal details introduced in this PR into Impl, including any that I did not call out explicitly. We somehow let the other batched stuff slip by unnoticed and now it's even more work to deprecate it, etc.
I have moved all implementation details under Impl namespace.
@cwpearson Another review is appreciated
@cwpearson Thank you for approval. For some reason, some checks do not start appropriately. Do we need a AT: PRE-TEST INSPECTED label for these checks?
@yasahi-hpc some of our test machines have been down due to maintenance, we will re-run things when they get back online. I will also have a look at the PR : )
@yasahi-hpc some of our test machines have been down due to maintenance, we will re-run things when they get back online. I will also have a look at the PR : )
Sure, I will wait for that. Further reviews are welcome. For the moment, I will continue to improve the implementation of PR #2331
Just a small change if you don't mind and I'll rerun the tests and merge
I have fixed accordingly. If it is fine, please rerun the tests
@lucbv Thank you for your approval. For some reason, AT2 checks do not start appropriately. Do we need some labels for these tests?
I asked the team in charge of our testing infrastructure to look into it. There seems to be a bug in the tester software, I reported it and hopefully it will be fixed quickly.
Sure. Thanks a lot!
For some reason after fixing the conflict in the unit-test header for batched dense the CI workflows are now checking out the right version of Kokkos to build against... well good enough x)