ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

Add the L1 jacobi

Open yhmtsai opened this issue 1 year ago • 19 comments

This add the L1 to Jacobi. The detail is shown in https://epubs.siam.org/doi/abs/10.1137/100798806 original preconditioner is on $M$ (which is diagonal of A for scalar Jacobi or diagonal block of A) L1 smoother is on $M' = M + D^{\mathcal{l}_1}$ whose $D^{\mathcal{l}_1}$ is a diagonal matrix with

$$D_{ii}^{\mathcal{l}1} = \sum{j \in \text{the off-diagonal block of corresponding block of i}}{|A_{ij}|}$$

I am not sure whether the theorems in the paper still work for negative diagonal entries.

yhmtsai avatar Mar 23 '23 10:03 yhmtsai

I'm a bit confused. The paper talks about the L1 Jacobi only in a distributed setting, ie the off-diagonal entries you mentioned are the entries of our non-local matrix in the distributed matrix. But your implementation is for non-distributed matrices. Can you perhaps provide another reference for the non-distributed case?

MarcelKoch avatar Mar 23 '23 10:03 MarcelKoch

Codecov Report

Attention: Patch coverage is 96.71053% with 5 lines in your changes missing coverage. Please review.

Project coverage is 90.82%. Comparing base (bf09bd2) to head (259f96f). Report is 3 commits behind head on develop.

:exclamation: Current head 259f96f differs from pull request most recent head 14d1a98

Please upload reports for the commit 14d1a98 to get more accurate results.

Files Patch % Lines
core/device_hooks/common_kernels.inc.cpp 0.00% 2 Missing :warning:
reference/test/preconditioner/jacobi_kernels.cpp 95.12% 2 Missing :warning:
common/unified/preconditioner/jacobi_kernels.cpp 83.33% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1310      +/-   ##
===========================================
+ Coverage    90.03%   90.82%   +0.79%     
===========================================
  Files          759      577     -182     
  Lines        61167    48872   -12295     
===========================================
- Hits         55074    44390   -10684     
+ Misses        6093     4482    -1611     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 23 '23 11:03 codecov[bot]

it's only based on the nonoverlapping partition. we can also add the L1 to the Schwarz for the distributed local preconditioner. 6.2 in the paper mentioned the L1 scalar Jacobi

yhmtsai avatar Mar 23 '23 12:03 yhmtsai

I think sec. 6.2 means using the preconditioner from 6. M = M_H + D^l with M_H = D. The D^l is still build from the non-local columns. I'm not saying it still works for non-distributed matrices, just that the paper description applies to distributed matrices. (The L1 smoother for non-distributed matrices would just be the standard Jacobi.)

MarcelKoch avatar Mar 23 '23 14:03 MarcelKoch

If the partition is only designed with the mpi (let me use the mpi here to distinguish), L1 is indeed normal Jacobi i.e using Schwarz with L1 will give this behavior. But the paper from second paragraph of section 5 makes me think it works for any distributed (any block partition). design the partition by the mpi is one example

yhmtsai avatar Mar 23 '23 15:03 yhmtsai

there's one reference from precond. They use L1-scalar Jacobi but they also use the absolute value of diagonal part https://etna.math.kent.edu/vol.55.2022/pp687-705.dir/pp687-705.pdf

yhmtsai avatar Mar 23 '23 15:03 yhmtsai

The L1 smoother from sec. 6 is an improvement on the hybrid smoother from sec 5. The hybrid smoother applies a local smoother only on the local matrix block of a distributed matrix and ignores the non-local part. The L1 smoother than takes the non-local part into account through the D^l matrix. Again, this might also work for non-distributed matrices, but maybe it would not be the L1 smoother people would expect. But from the other paper you linked, people seem to be not very careful about this definition.

MarcelKoch avatar Mar 23 '23 15:03 MarcelKoch

Perhaps naming it 'lumped-Jacobi' would be better suited to differentiate between those two variants. And I guess the L1 from the first paper could be implemented as another distributed preconditioner. Also, the paper you mentioned does a full row sum, so it also includes the diagonal value.

MarcelKoch avatar Mar 23 '23 15:03 MarcelKoch

The first paper considers the diagonal entries are positive, so they are equivalent. Distributed on mpi or distributed on the block to me are the same thing, which are ways to cut the matrices. Applying L1 to LocalPreconditioner and to Schwarz are enough to distinguish. With lumped, we still need to indicate it is L1 in the comments

yhmtsai avatar Mar 23 '23 16:03 yhmtsai

My argument is that what is implemented in this PR is not the L1 from the paper, so we should not call it that. But honestly, I don't have a great overview of how the term L1 smoother is usually interpreted.

MarcelKoch avatar Mar 23 '23 16:03 MarcelKoch

~~And in your interpretation, there are still the diagonal entries missing in the sum. The preconditioner is D + D^l.~~ Sorry, I missed the part that initializes the array with the (block) diagonal.

MarcelKoch avatar Mar 23 '23 16:03 MarcelKoch

It should be theorically L1 smoother. otherwise, I just add some random Jacobi. (I need to use [] not {} to represent row index set because I can not display {} with github latex) scalar Jacobi: $\Omega_k = [k]$ block Jacobi: $\Omega_k = [row \in \text{block}_k]$

yhmtsai avatar Mar 23 '23 17:03 yhmtsai

The L1 smoother (Jacobi or GS) explicitly references distributed matrices. I still think that if we want to follow the naming of the paper, the variant implemented here should not be called L1. Instead, we should add a new distributed preconditioner which consists of a local solver (Jacobi, or in the future perhaps GS) and a diagonal matrix with the row-wise sum of the non-local part.

MarcelKoch avatar Mar 23 '23 17:03 MarcelKoch

Maybe I am misunderstanding something, but in the non-distributed case, this is basically diagonal relaxation with weights equal to row sums, right ? So, I think it definitely makes sense to have as a smoother as an alternative to the scalar Jacobi smoother.

I think the name L1 is well suited because you are taking the L1 norm on the row space vectors ? $|v| = \sum_{n}|v_i|$.

Not sure if there is some mathematical basis for this yet, we could also try seeing if any other p-norms also make sense, Particularly with $p=\infty$, $p=2$.

pratikvn avatar Apr 06 '23 16:04 pratikvn

If we go with @pratikvn suggestion, and base this version on the L1 row norm, then we need to take the absolute value of the diagonal.

MarcelKoch avatar Apr 11 '23 07:04 MarcelKoch

I am not sure whether to take the absolute value of the diagonal value. It might make sense for the scalar version but not for the block version. What's the meeting of the absolute of diagonal block to the diagonal value?

yhmtsai avatar May 25 '23 12:05 yhmtsai

Taking the absolute value is the only way for this preconditioner to make sense. And for blocks, I would just go with taking the absolute value component-wise. I think then it makes the most sense to add up these block-wise. Currently, you seem to only update the diagonal values of the diagonal block, or maybe I'm mistaken.

MarcelKoch avatar May 25 '23 14:05 MarcelKoch

Yes, from the block-version, it only updates the diagonal value. I think l1 is mostly from taking the summation of the absolute value in non-local block's row, not the entire row. (l1 norm of non-local-block rows) L1 does not apply to the local block. From the formula, it does not change the other values in the block. In the smoother paper, they only consider the positive value in diagonal (see 5.5), although they have additional properties more than positive diagonal values. In another paper, they consider the SPD cases, so the diagonal values are also positive. When the diagonal value is positive, it does not change the result no matter whether we take the absolute value of the diagonal value or not. Another treatment is to $A_{ii} + sign(A_{ii}) D^{l_1}$ if -A is SPD for scalar Jacobi,

From the paper or formula perspective, I prefer only adding values on the diagonal value without take absolute on the diagonal value first. For the negative diagonal value, we do not know whether it works or not from the papers currently.

yhmtsai avatar May 30 '23 15:05 yhmtsai