Flux.jl Fixed the spectral normalization

Nov 24 '17 18:11 pevnak

Can you give a simple usage example for this, and/or a general idea of how it should be used?

Dec 08 '17 18:12 MikeInnes

Hi Mike,

this should be a regularization technique described in this paper https://arxiv.org/abs/1705.10941 and it should be a drop-in replacement for the weight decay. The crucial difference to popular weight decay is that it regularizes Lipschitz constant of the final network, which seems to be important for example for training Gans. https://openreview.net/forum?id=B1QRgziT-&noteId=SJok1XB-f

My latest implementation looks as follows:

"""
    Spectral norm regularization as proposed in
    Spectral Norm Regularization for Improving the Generalizability of Deep Learning, Yuichi Yoshida, Takeru Miyato, 2017
    https://arxiv.org/pdf/1705.10941.pdf
"""
function spectral(p::Flux.Optimise.Param, λ::Real)
  if ndims(p.x) !=2
    return(() -> nothing)
  end
  n,m = size(p.x)
  u = similar(p.x,n)
  u .= randn(n)
  v = similar(p.x,m)
  v .= randn(m)
  function ()
    u .= p.x * v
    v .= (u' * p.x )'
    σ = norm(v)/norm(u)
    v ./=norm(v)
    u ./=norm(u)
    p.Δ .+= λ*σ*u*v'
    nothing
  end
end

function spectralnorm(A,i=1000)
  n,m = size(A)
  u = similar(A,n)
  u .= randn(n)
  v = similar(A,m)
  v .= randn(m)
  for ii in 1:i
    v ./=norm(v)
    u ./=norm(u)
    u .= A * v
    v .= (u' * A )'
  end
  norm(v)/norm(u)
end

SpectralADAM(ps, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, λ = 0) =
  Flux.Optimise.optimiser(ps, p -> Flux.Optimise.adam(p; η = η, β1 = β1, β2 = β2, ϵ = ϵ), p -> spectral(p, λ), p -> Flux.Optimise.descent(p, 1))

#unit test
# A = randn(5)
# A = A + A';
# maximum(abs.(eig(A)[1])) - spectralnorm(A)

but I confess that the results I am getting are very weird. I thought it would be good if it is on the part of the Flux at least for the sake of completness.

Best wishes, Tomas

-------- Original Message -------- Subject: Re: [FluxML/Flux.jl] Fixed the spectral normalization (#115) Local Time: 8 December 2017 7:35 PM UTC Time: 8 December 2017 18:35 From: [email protected] To: FluxML/Flux.jl [email protected] pevnak [email protected], Author [email protected]

Can you give a simple usage example for this, and/or a general idea of how it should be used?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Dec 09 '17 18:12 pevnak

@pevnak are you still interested in pursuing this? I see the authors released a follow-up paper at https://arxiv.org/abs/1802.05957.

Feb 11 '21 21:02 ToucheSir

Bump on this @pevnak. If this is too far in the rearview mirror, I'd suggest we open an issue and close this PR. That way it's clear what work is up for grabs.

Jan 27 '22 22:01 darsnack

this type of normalization doesn't seem to be used in current practice, not worth opening an issue unless someone is interested into it

Jun 17 '23 08:06 CarloLucibello