kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

implement batched serial pbtrs

Open yasahi-hpc opened this issue 1 year ago • 11 comments

This PR implements pbtrs function.

Following files are added:

  1. KokkosBatched_Pbtrs_Serial_Impl.hpp: Internal interfaces
  2. KokkosBatched_Pbtrs_Serial_Internal.hpp: Implementation details
  3. KokkosBatched_Pbtrs.hpp: APIs
  4. Test_Batched_SerialPbtrs.hpp: Unit tests for that

Detailed description

It solves the equation A * x = b, where A is a real symmetric (complex Hermitian) positive definite band matrix A, where A is stored in a band storage. Before solving, the Cholesky factorization A = U**H*U or A = L*L**H must be computed by Pbtrf.

Here, the matrix has the following shape.

  • A: (batch_count, ldab, n)
    On entry, the upper or lower triangle of the symmetric band matrix A, stored in the first KD+1 rows of the array. On exit, the triangular factor U or L from the Cholesky factorization A = U**H*U or A = L*L**H of the band matrix A, in the same storage format as A
  • B: (batch_count, n)
    On entry, it contains the n element n right-hand side vector b. On exit, the solution vectors x.

Example of a single batch of matrix A n = 10.

A
4 1 0 0 0 0 0 0 0 0 
1 4 1 0 0 0 0 0 0 0 
0 1 4 1 0 0 0 0 0 0 
0 0 1 4 1 0 0 0 0 0 
0 0 0 1 4 1 0 0 0 0 
0 0 0 0 1 4 1 0 0 0 
0 0 0 0 0 1 4 1 0 0 
0 0 0 0 0 0 1 4 1 0 
0 0 0 0 0 0 0 1 4 1 
0 0 0 0 0 0 0 0 1 4

AB (upper)
0 1 1 1 1 1 1 1 1 1 
4 4 4 4 4 4 4 4 4 4

AB (lower)
4 4 4 4 4 4 4 4 4 4
1 1 1 1 1 1 1 1 1 0

Parallelization would be made in the following manner. This is efficient only when A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('pbtrs', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
        auto aa = Kokkos::subview(_a, k, Kokkos::ALL(), Kokkos::ALL());
        auto bb = Kokkos::subview(_b, k, Kokkos::ALL());

        KokkosBatched::SerialPbtrs<AlgoTagType>::invoke(aa, bb);
    });

Tests

  1. Confirm A * x = b for analytical case, where
A: [[4, 1, 0],
    [1, 4, 1],
    [0, 1, 4]]
b: [1, 1, 1]
x: [3/14, 1/7, 3/14]
  1. Make a real (complex Hermitian) symmetric positive definite banded matrix from random and keep it in a full (called A) or banded storage (called AB). Firstly, factorize AB in banded storage to get U or L by pbtrf. Then, solve AB * x = b with pbtrs to get x, while keeping the original b in x_ref. Finally confirm that A * x is equal to b (=x_ref) using GEMV.

yasahi-hpc avatar Sep 10 '24 05:09 yasahi-hpc

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

kokkos-devops-admin avatar Sep 10 '24 05:09 kokkos-devops-admin

I'm not familiar with the algorithm, so I may be wrong about what the expected public interface is.

@cwpearson Thank you for your review. As for the namespace, I fully agree with your comments: these functions should be placed under Impl. However, the KokkosBatched implementation details are not placed under Impl (see Trsv for example). If you think it is better, I can move checkPbtrsInput, SerialPbtrsInternalLower and SerialPbtrsInternalUpperdetails under Impl namespace

yasahi-hpc avatar Sep 18 '24 06:09 yasahi-hpc

Yes, please move any internal details introduced in this PR into Impl, including any that I did not call out explicitly. We somehow let the other batched stuff slip by unnoticed and now it's even more work to deprecate it, etc.

cwpearson avatar Sep 18 '24 21:09 cwpearson

I have moved all implementation details under Impl namespace. @cwpearson Another review is appreciated

yasahi-hpc avatar Sep 19 '24 02:09 yasahi-hpc

@cwpearson Thank you for approval. For some reason, some checks do not start appropriately. Do we need a AT: PRE-TEST INSPECTED label for these checks?

yasahi-hpc avatar Sep 19 '24 23:09 yasahi-hpc

@yasahi-hpc some of our test machines have been down due to maintenance, we will re-run things when they get back online. I will also have a look at the PR : )

lucbv avatar Sep 20 '24 14:09 lucbv

@yasahi-hpc some of our test machines have been down due to maintenance, we will re-run things when they get back online. I will also have a look at the PR : )

Sure, I will wait for that. Further reviews are welcome. For the moment, I will continue to improve the implementation of PR #2331

yasahi-hpc avatar Sep 23 '24 07:09 yasahi-hpc

Just a small change if you don't mind and I'll rerun the tests and merge

I have fixed accordingly. If it is fine, please rerun the tests

yasahi-hpc avatar Oct 05 '24 08:10 yasahi-hpc

@lucbv Thank you for your approval. For some reason, AT2 checks do not start appropriately. Do we need some labels for these tests?

yasahi-hpc avatar Oct 07 '24 16:10 yasahi-hpc

I asked the team in charge of our testing infrastructure to look into it. There seems to be a bug in the tester software, I reported it and hopefully it will be fixed quickly.

lucbv avatar Oct 07 '24 17:10 lucbv

Sure. Thanks a lot!

yasahi-hpc avatar Oct 07 '24 17:10 yasahi-hpc

For some reason after fixing the conflict in the unit-test header for batched dense the CI workflows are now checking out the right version of Kokkos to build against... well good enough x)

lucbv avatar Oct 24 '24 23:10 lucbv