add PD3O with tests
Description
Adding PD3O algorithm with unitests.
Implementation based from A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. With this algorithm we have for free 2 additional algs: : PAPC, David-Yin (sometimes called PDDY). PDHG is a special case. Also, we can have Condat-Vu, AFBA, see details in the above paper. Finally, all of the above algorithms can be combined with stochastic (variance-reduced) algorithms.
Below some results, presented 2 years (Workshop on modern image reconstruction algorithms and practices for medical imaging) ago using the stochastic framework just before removing the 1/n weight convention.
Related issues/links
Closes #1890
Checklist
- [ ] I have performed a self-review of my code
- [ ] I have added docstrings in line with the guidance in the developer guide
- [ ] I have updated the relevant documentation
- [ ] I have implemented unit tests that cover any new or modified functionality
- [ ] CHANGELOG.md has been updated with any functionality change
- [ ] Request review from all relevant developers
- [ ] Change pull request label to 'Waiting for review'
Contribution Notes
Please read and adhere to the developer guide and local patterns and conventions.
- [ ] The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License
- [ ] I confirm that the contribution does not violate any intellectual property rights of third parties
Thanks @epapoutsellis this looks great! We're not far from releasing 24.1, but we'll be planning the next release in a couple of weeks and we can add this to the workplan.
Thanks @epapoutsellis! It looks good. I added a few todos for myself around testing and documentation. Feel free to answer them if you have time before me but I will aim to look again around the beginning of July.
Hi @epapoutsellis - I am going through this in detail and have a few questions:
-
Comparing the code to the paper, I have lost this term:
Can you point me in the right direction in the code?
-
How do we want to set default step sizes: The paper suggests convergence if
where $\beta =1/f.L$ and $\lambda=\delta\gamma$ and thus I suggest
but don't know if you have any experience in practice
-
I presume we raise a warning not an error for testing purposes so we can compare against PDHG?
-
Do we want a check convergence option, like we have for PDHG and FISTA?
- How do we want to set default step sizes: The paper suggests convergence if
where β=1/f.L and λ=δγ and thus I suggest
but don't know if you have any experience in practice
Actually, playing with this further. I considered
def test_pd3o_vs_fista(self):
alpha = 0.1
G = alpha * TotalVariation(max_iteration=5, lower=0)
F= 0.5 * L2NormSquared(b=self.data)
algo=FISTA(f=F, g=G,initial=0*self.data)
algo.run(200)
F1= 0.5 * L2NormSquared(b=self.data)
H1 = alpha * MixedL21Norm()
operator = GradientOperator(self.data.geometry)
G1= IndicatorBox(lower=0)
norm_op=operator.norm()
algo_pd3o=PD3O(f=F1, g=G1, h=H1, operator=operator, initial=0*self.data, gamma=gamma, delta=delta)
algo_pd3o.run(500)
np.testing.assert_allclose(algo.solution.as_array(), algo_pd3o.solution.as_array(), atol=1e-2)
np.testing.assert_allclose(algo.objective[-1], algo_pd3o.objective[-1], atol=1e-2)
For the proposed default step sizes:
gamma =2./F.L
delta = 1./(gamma*norm_op**2)
PD3O does not converge to the correct solution however with
gamma = 0.99*2./F.L
delta = 1./(gamma*norm_op**2)
it does converge. Any thoughts?
For the step size, it sounds like the "safety factor" of 0.99 is sensible.
For you question 1 on AA^T having gone missing. Maybe I misunderstand, but I read the implemented code as based on equation 4 (also the code comment states this), which is reformulated from equation 3, and the AA^T appears only in eq 3.
Not sure if this affects whether AA^T should be included in the step size bound as well?
Eq. (4a) contains the gradient of l^*. I don't see that in the code?
For you question 1 on AA^T having gone missing. Maybe I misunderstand, but I read the implemented code as based on equation 4 (also the code comment states this), which is reformulated from equation 3, and the AA^T appears only in eq 3.
Aaah - I was looking at the arxiv version of the paper which had a different eq4. That makes more sense.
Eq. (4a) contains the gradient of l^*. I don't see that in the code?
The paper considers optimising
where we are considering an objective $f(x)+g(x)+h(Ax)$. I don't know enough about infimal convolutions to know which $l$ makes the two equivalent
For reference the link to the published version: https://link.springer.com/article/10.1007/s10915-018-0680-3
In the lines just below eq (4) it says l^* is considered equal to zero. So perhaps this implementation assumes l^*=0. @epapoutsellis can you confirm?
Re. question 3, whether a warning. I think it makes sense to allow the algorithm to run with ZeroFunction being passed. I think also consistent with allowing this in PDHG and FISTA? Have you checked if the algorithm behaves sensibly with a ZeroFunction passed, including converging to expected result? I think having a warning is okay to recommend the user to switch to PDHG, but allow it to run. If the algorithm runs and behaves as expected I'd also be okay with omitting the warning.
For reference the link to the published version: https://link.springer.com/article/10.1007/s10915-018-0680-3
In the lines just below eq (4) it says l^* is considered equal to zero. So perhaps this implementation assumes l^*=0. @epapoutsellis can you confirm?
I think that makes sense, if $l= \iota_{0}$ then $h\square l = h$, $l^* =0$ and thus $\nabla l^*=0$
Re. question 3, whether a warning. I think it makes sense to allow the algorithm to run with ZeroFunction being passed. I think also consistent with allowing this in PDHG and FISTA? Have you checked if the algorithm behaves sensibly with a ZeroFunction passed, including converging to expected result? I think having a warning is okay to recommend the user to switch to PDHG, but allow it to run. If the algorithm runs and behaves as expected I'd also be okay with omitting the warning.
Yes, one of the tests compares the behaviour of PDHG and PD3O with a zero function. I think in PDHG we have the strong-convexity additions so it is probably worth the user using that over PD3O
OK great yes then I suggest keeping the warning to allow users to run PD3O with ZeroFunction to deliberately create the special case algorithm and compare with our PDHG implementation, and continue to have the warning recommend the user to switch to PD3O for efficiency reasons.
BTW, a few places appear to use the number 0 instead of the letter O in ie "PD30" instead of "PD3O", please search and replace.
Yes, I use equations below from https://doi.org/10.1007/s10915-018-0680-3
-
No need to cover the inf-conv case for now. Maybe an
InfimalConvolutionFunction class in the future could be useful and compute the pdgap for the PD3O case. This is why, I included onlyprimal objectivein theupdate_objective. -
Better to allow
f=ZeroFunction, with a warning. In this case, user can go back to PDHG which is safer in terms of default step-sizes. -
For the step-sizes, better to have
0.99*2which close to theory. In practice, it depends on iteration number used for the power method and the projection operator (shape of ig, ag etc). "Better" step-sizes are in https://arxiv.org/pdf/2201.00139, table 2. This also related to the previous point. In theory, you can use any $(0,\infty)$ parameter whenf=0. In practice, I noticed that you need to tune this parameter. Another option is to use instead anepsilon/2*L2normSquared()whereepsilonissmall.
- For the step-sizes, better to have
0.99*2which close to theory.
Thanks Vaggelis, I think i have done this.
In theory, you can use any (0,∞) parameter when
f=0. In practice, I noticed that you need to tune this parameter. Another option is to use instead anepsilon/2*L2normSquared()whereepsilonissmall.
I have set the same default as PDHG when f=0 so 1.0/operator.norm() for both gamma and delta
@epapoutsellis @jakobsj - please can I revive this PR. Do you have any more comments?
Jenkins is happy