LinearOperators.jl icon indicating copy to clipboard operation
LinearOperators.jl copied to clipboard

LSR1Operator(1) fails the secant equation

Open paraynaud opened this issue 3 years ago • 6 comments

Hello all,

I did test the LSR1 operators for myself, and it seems that there is a problem with the LSR1Operator of dimension 1. In my work, I generate automatically several LSR1Operators, and some of them are of dimension 1.

Here is an example to replicate the issue:

n = 1
B = LSR1Operator(n)
r = 0.0
for i = 1:100
  global r
  s = rand(n)
  y = rand(n)
  push!(B, s, y)  
  r += norm(B * s - y) #secant equation
end
r #166.39...

If you replace n=1 by n=2 then r = 1e-14.

The problem does not occur with the LBFGSOperators.

paraynaud avatar Mar 29 '22 19:03 paraynaud

Is every update accepted? https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/main/src/lsr1.jl#L150

dpo avatar Mar 29 '22 19:03 dpo

The conditions for the update are fulfilled in the code above. The next piece of code use a loop which guarantee which garantees that the conditons of LSR1 are satisfied before the push!.

The strange thing is that in some cases the update happens, and in other cases it does not happen. For example the pair (s,y) ,

B = LSR1Operator(1)
s = [0.5809871055619441]
y = [0.8263178027435335]
push!(B, s, y)  #1×1 Matrix{Float64}: 1.4222653047425207

produces an updated 1x1 matrix.

To extend the analysis, I tried the push! in a loop similar to the previous one.

n = 1
r = 0.0
cpt = 0 
cpt_approx_null = 0

for i = 1:100
  B = LSR1Operator(n)
  global r
  global cpt
  global cpt_approx_null
  s = rand(n)
  y = rand(n)
  ϵ = eps(eltype(y))  
  ymBs = y - B * s
  well_defined(ymBs, s) = abs(dot(ymBs, s)) ≥ ϵ + ϵ * norm(ymBs) * norm(s, 2)
  sufficient_curvature(s, y) = abs(dot(y, s)) >= ϵ * norm(y, 2) * norm(s, 2)
  scaling_factor(s, y) = dot(y, s) / dot(y, y)
  scaling_condition(s, y) = norm((y - s) / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)
  if well_defined(ymBs, s) && scaling_condition(s, y) && sufficient_curvature(s, y)
    push!(B, s, y)  
    cpt += 1
    ymBs = y - B*s    
    norm_ymBs = norm(ymBs, 2)
    # @show norm_ymBs
    r += norm_ymBs
    if isapprox(norm_ymBs, 0., atol=1e-10)
      cpt_approx_null += 1
    end 
  end 
  # println(Matrix(B))
end
r, cpt, cpt_approx_null # (22.107767681026328, 100, 29)

Usually, every pair s,y should update the virgin LSR1 operator, but only 29 updates satisfy the secant equation.

paraynaud avatar Mar 29 '22 22:03 paraynaud

I have new examples of failures for LSR1Operator and an explanation of why they fail.

lsr1 = LSR1Operator(2)
y = [-5., -5.]
s = [-1., -1.]
push!(lsr1, s ,y)
Matrix(lsr1) # indentity matrix

The vectors y and s are collinear, I tried other collinear vectors to verify

y = [-5., -4.]
s = [-1.25, -1.]
push!(lsr1, s ,y)
Matrix(lsr1) # still the identity

and i get the same result.

The test https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/e663e4774d798e967003106c57acb611cd32ae78/src/lsr1.jl#L150 fails, because the scaling condition https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/e663e4774d798e967003106c57acb611cd32ae78/src/lsr1.jl#L146 is false.

If we suppose s = $\alpha$ * y (s,y collinear), the scaling factor ys/yy is of value $\alpha$. Then, the scaling condition norm(y - s / scaling_factor) will always be 0 since s / scaling_factor is of value $\tfrac{\alpha y }{\alpha} = y$. It explains why LSR1Operator(1) fails, since there is two component of size one (s,y), inevitably collinear.

When I saw this, I tried again the loop test that I made previously and I spotted an error in my numerical test

scaling_condition(s, y) = norm((y - s) / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)

which should be

scaling_condition(s, y) = norm(y - s / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)

I restarted the same loop with the modified scaling_condition

using LinearOperators, LinearAlgebra

n = 1
r = 0.0
cpt = 0 
cpt_approx_null = 0
for i = 1:100
  B = LSR1Operator(n)
  global r
  global cpt
  global cpt_approx_null
  s = rand(n)
  y = rand(n)
  ϵ = eps(eltype(y))  
  ymBs = y - B * s
  well_defined(ymBs, s) = abs(dot(ymBs, s)) ≥ ϵ + ϵ * norm(ymBs) * norm(s, 2)
  sufficient_curvature(s, y) = abs(dot(y, s)) >= ϵ * norm(y, 2) * norm(s, 2)
  scaling_factor(s, y) = dot(y, s) / dot(y, y)
  scaling_condition(s, y) = norm(y - s / scaling_factor(s, y), 2) >= ϵ * norm(y, 2) * norm(s, 2)
  if well_defined(ymBs, s) && scaling_condition(s, y) && sufficient_curvature(s, y)
    @show s, y # that way you can try the pairs that you get
    push!(B, s, y)  
    cpt += 1
    ymBs = y - B*s    
    norm_ymBs = norm(ymBs, 2)
    r += norm_ymBs
    if isapprox(norm_ymBs, 0., atol=1e-10)
      cpt_approx_null += 1
    end 
  end 
end
r, cpt, cpt_approx_null # (0.0, 25, 25)

Out of 100 random pairs of vectors of size 1, only 25 passed the numerical tests (mainly scaling_condition). One of those was

(s, y) = ([0.49377733218773445], [0.11429787904158029])

This pair passes the scaling_condition with

norm(y - s / scaling_factor(s, y), 2) # 5.551115123125783e-17 a numerical 0

superior to

ϵ * norm(y, 2) * norm(s, 2) # 1.2531687196363857e-17

And all the other pairs s,y that I tried where similar.

I don't know if this is an expected behaviour, because the collinear pairs satisfy the well_defined condition https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/e663e4774d798e967003106c57acb611cd32ae78/src/lsr1.jl#L137

I made several tests on my SR1 implementation using collinear pairs s,y, and the resulting matrices satisfied the secant equation and were well-defined.

I'm not sure how to test the collinear vectors, but here is a try

s = [1., 2.]
y = [2., 4.]
myand(a,b) = a && b
s_y = (s ./ y)
collinear = (length(s) == 1) || mapreduce(i-> i == s_y[1], myand, s_y) # true

collinear check if s is a vector of size one, or if the result of an componentwise division s_y = s ./ y have the same value for every component (i.e. the same value than s_y[1]). collinear could be use to force scaling_condition to true.

paraynaud avatar Jun 22 '22 21:06 paraynaud

Thank you for this analysis.

I made several tests on my SR1 implementation using collinear pairs s,y, and the resulting matrices satisfied the secant equation and were well-defined.

How does your implementation differ?

dpo avatar Jun 25 '22 12:06 dpo

My SR1 implementation is very simple, it doesn't have as many numerical conditions as push! of LSR1Operator does. The only numerical test of my SR1 implementation is the same as https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/e663e4774d798e967003106c57acb611cd32ae78/src/lsr1.jl#L150

It does not implement the sufficient_curvature and scaling_condition tests. https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/e663e4774d798e967003106c57acb611cd32ae78/src/lsr1.jl#L143 https://github.com/JuliaSmoothOptimizers/LinearOperators.jl/blob/e663e4774d798e967003106c57acb611cd32ae78/src/lsr1.jl#L146

The PR #235 sets data.scaling of the current iterate to false, which implies that sufficient_curvature and scaling_condition stay true. In addition, it dosen't update scaling_factor.

paraynaud avatar Jun 25 '22 13:06 paraynaud

In fact, I don't remember where scaling_condition came from. Maybe it was an experiment and should be removed. Clearly, it prevents updates when s and y are collinear.

dpo avatar Jun 25 '22 13:06 dpo