Flux.jl Some small printing upgrades

This is intended for https://github.com/FluxML/Fluxperimental.jl/pull/20 but probably a good idea anyway. Motivating example is something like this (where bigger arrays will print pages of numbers):

julia> struct Tmp2; x; y; end; Flux.@functor Tmp2

julia> Chain(Tmp2([Dense(2,3), randn(3,4)'], (x=1:3, y=Dense(3,4), z=rand(3))))
Chain(
  Tmp2(
    Array(
      Dense(2 => 3),                    # 9 parameters
      [0.351978391016603 0.6408681372462821 -1.326533184688648; 0.09481930831795712 
1.430103476272605 0.7250467613675332; 2.03372151428719 -0.015879812799495713 
1.9499692162118236; -1.6346846180722918 -0.8364610153059454 -1.2907265737483433],  # 12 parameters
    ),
    NamedTuple(
      1:3,                              # 3 parameters
      Dense(3 => 4),                    # 16 parameters
      [0.9666158193429335, 0.01613900990539574, 0.0205920186127464],  # 3 parameters
    ),
  ),
)                   # Total: 7 arrays, 43 parameters, 644 bytes.

Notice that Array() and NamedTuple() aren't actually valid syntax. Also that it prints whole arrays if they aren't inside a layer the way it expects. After this PR:

julia> Chain(Tmp2([Dense(2,3), randn(3,4)'], (x=1:3, y=Dense(3,4), z=rand(3))))
Chain(
  Tmp2(
    [
      Dense(2 => 3),                    # 9 parameters
      4×3 Adjoint,                      # 12 parameters
    ],
    (;
      x = 3-element UnitRange,          # 3 parameters
      y = Dense(3 => 4),                # 16 parameters
      z = 3-element Array,              # 3 parameters
    ),
  ),
)                   # Total: 7 arrays, 43 parameters, 644 bytes.

Oct 13 '23 12:10 mcabbott

Is having the summary instead of the actual contents important for @compact? If not, my preference would be to keep the more accurate printing but drop that part.

Oct 14 '23 22:10 ToucheSir

The idea is that printing the values of a huge matrix isn't helpful, but showing the size is. This is what the present printing of e.g. Dense achieves.

But @compact seems to often want to store the arrays themselves without some layer.

(It's not exactly summary as this gets quite long for e.g. CuArrays)

Oct 14 '23 22:10 mcabbott

Revisiting this, it would be nice to retain some more info such as eltype if the contents aren't printed. Though it's far from perfect, could we avoid a lot of complexity by setting the :limit and :compact IOContext properties when showing these leaf nodes?

Nov 10 '23 05:11 ToucheSir

Latest commit keeps eltype as suggested (or really, the first type parameter):

julia> struct Tmp2; x; y; end; Flux.@functor Tmp2

julia> Chain(Tmp2([Dense(2,3), randn(3,4)'], (x=1:3, y=Dense(3,4), z=rand(3))))
Chain(
  Tmp2(
    [
      Dense(2 => 3),                    # 9 parameters
      4×3 Adjoint{Float64,...},         # 12 parameters
    ],
    (;
      x = 3-element UnitRange{Int64},   # 3 parameters
      y = Dense(3 => 4),                # 16 parameters
      z = 3-element Vector{Float64},    # 3 parameters
    ),
  ),
)                   # Total: 7 arrays, 43 parameters, 780 bytes.

Note also that no model using existing layers is affected by this. The goal is mainly to make @compact layers print more like the existing ones, by describing their parameter arrays, not printing the actual numbers.

Nov 28 '23 14:11 mcabbott

We were discussing this during our call this week. The consensus was that using IOContext(..., :limit => true) is preferable to a custom summary for arrays. This would keep things succinct while still matching Base standards/expectations.

There are times when the type (e.g. Adjoint) is desired. This case might be better addressed by offering something like Flax's tabulate.

Dec 01 '23 17:12 darsnack

Flux.jl Flux.jl copied to clipboard

Some small printing upgrades

Flux.jl
Flux.jl copied to clipboard