KokkosBatched::SerialSVD::invoke(..) hang in Kokkos v4.6
Hi,
I'm seeing a hang (infinite loop) in SerialSVD in Kokkos v4.6:
TEST(KokkosSerialSVD, does_not_solve2)
{
Kokkos::View<double[3][6], Kokkos::HostSpace> A(Kokkos::ViewAllocateWithoutInitializing("A"));
Kokkos::View<double[3][3], Kokkos::HostSpace> U(Kokkos::ViewAllocateWithoutInitializing("U"));
Kokkos::View<double[6][6], Kokkos::HostSpace> V(Kokkos::ViewAllocateWithoutInitializing("V"));
Kokkos::View<double[3], Kokkos::HostSpace> S(Kokkos::ViewAllocateWithoutInitializing("S"));
Kokkos::View<double[30], Kokkos::HostSpace> work(Kokkos::ViewAllocateWithoutInitializing("work"));
A(0, 0) = -2.3588494081694974e-03;
A(0, 1) = -2.3602176428346553e-03;
A(0, 2) = -3.3360574050870077e-03;
A(0, 3) = -2.3589487578561312e-03;
A(0, 4) = -3.3359167956075490e-03;
A(0, 5) = -3.3378517656821728e-03;
A(1, 0) = 3.3359168246290603e-03;
A(1, 1) = 3.3378518006490351e-03;
A(1, 3) = 3.3360573263032968e-03;
A(2, 0) = -2.3588494081695022e-03;
A(2, 1) = -2.3602176428346587e-03;
A(2, 2) = 3.3360574050869769e-03;
A(2, 3) = -2.3589487578561286e-03;
A(2, 4) = 3.3359167956075399e-03;
A(2, 5) = 3.3378517656821581e-03;
KokkosBatched::SerialSVD::invoke(KokkosBatched::SVD_USV_Tag{}, A, U, S, V, work, 1e-12);
}
Compiler:
gcc-12.3.0
Thanks, -Alec
Upon further testing it looks like setting the tolerance to 1e-11 stops the infinite loop.. not sure why this is.
This might be related to this issue filed a couple months ago:
https://github.com/kokkos/kokkos-kernels/issues/2557
if that helps.
add-iteration-limit-to-SVD.patch
@lucbv This is high priority and is impacting our production cases, so I've added the attached patch to our kokkos kernels spack setup for now. Can you prioritize getting a change like this into Kokkos Kernels directly? If we can provide a way to avoid an infinite loop we can handle the error by relaxing the tolerance, and/or perturbing/shuffling the input order.
Yes, I will review the patch and make it into a PR as well as add unit-test to make sure this does not creep back up.
Hi, we have another failing case:
TEST(RotateRows, BadSVDDefaultTol)
{
Kokkos::View<double **, Kokkos::LayoutRight, Kokkos::HostSpace> A("A", 3, 6);
Kokkos::View<double **, Kokkos::LayoutRight, Kokkos::HostSpace> U("U", 3, 3);
Kokkos::View<double **, Kokkos::LayoutRight, Kokkos::HostSpace> V("V", 6, 6);
Kokkos::View<double *, Kokkos::HostSpace> S("S", 3);
Kokkos::View<double *, Kokkos::HostSpace> work("work", 30);
Kokkos::View<double **, Kokkos::LayoutRight, Kokkos::HostSpace> A_scratch("A_scratch", 3, 6);
A(0, 0) = -0.49992589104804802114;
A(0, 1) = -0.50016956949997615212;
A(0, 2) = 0.70697176137856687639;
A(0, 3) = 0.70734658093545688118;
A(0, 4) = -0.49990454757986246825;
A(0, 5) = 0.70700198975105144061;
A(1, 0) = 0.70700197530160280301;
A(1, 1) = 0.70734658867317878883;
A(1, 2) = 0.00000000000000052857;
A(1, 3) = 0.00000000000000411362;
A(1, 4) = 0.70697179107942709209;
A(1, 5) = 0.00000000000000392977;
A(2, 0) = -0.49992589104804807665;
A(2, 1) = -0.50016956949997559700;
A(2, 2) = -0.70697176137856798661;
A(2, 3) = -0.70734658093545643709;
A(2, 4) = -0.49990454757986169110;
A(2, 5) = -0.70700198975105188470;
const double tol = 1e-12;
const int max_iters = 1000;
const int svd_err = KokkosBatched::SerialSVD::invoke(
KokkosBatched::SVD_USV_Tag{}, A, U, S, V, work, tol, max_iters);
EXPECT_FALSE(svd_err == 0);
}
In this case it will early exit after 1000 iterations (otherwise it will infinitely loop) If we drop the tolerance by 4 orders in magnitude and increase max_iters by 4 orders in magnitude we get the correct U matrix.