Flux.jl icon indicating copy to clipboard operation
Flux.jl copied to clipboard

Enable other reactant tests

Open wsmoses opened this issue 8 months ago • 7 comments

wsmoses avatar Apr 19 '25 21:04 wsmoses

@mcabbott @ToucheSir a lot of these remaining errors seem to be bugs in the testing infra?

Enzyme grad check ConvTranspose: Error During Test at /home/runner/work/Flux.jl/Flux.jl/test/ext_reactant/reactant.jl:46
  Got exception outside of a @test
  all arguments must have at least the same length of the firs one
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] _map(::Function, ::Tuple{KeyPath{Tuple{Int64, Symbol, Int64}}, KeyPath{Tuple{Int64, Symbol, Int64}}}, ::Tuple{Int64, Int64}, ::Vararg{Any})
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:2
    [3] (::Functors.StructuralWalkWithPath)(recurse::Function, kp::KeyPath{Tuple{Int64, Symbol}}, x::Tuple{Int64, Int64}, ys::Nothing)
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:108
    [4] (::Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)})(recurse::Function, kp::KeyPath{Tuple{Int64, Symbol}}, x::Tuple{Int64, Int64}, ys::Nothing)
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:135
    [5] (::Functors.CachedWalkWithPath{Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, Functors.NoKeyword, Functors.WalkCache{Any, Any, Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, IdDict{Any, Any}}})(recurse::Function, kp::KeyPath{Tuple{Int64, Symbol}}, x::Tuple{Int64, Int64}, ys::Nothing)
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:199
    [6] (::Functors.var"#recurse#27"{Functors.CachedWalkWithPath{Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, Functors.NoKeyword, Functors.WalkCache{Any, Any, Functors.ExcludeWalkWithKeyPath{Functors.StructuralWalkWithPath, var"#4#5"{Float64, Float64}, typeof(Functors.isleaf)}, IdDict{Any, Any}}}})(::KeyPath{Tuple{Int64, Symbol}}, ::Vararg{Any})
      @ Functors ~/.julia/packages/Functors/LbNAu/src/walks.jl:54

Do you have any idea what's happening (or could help make an isolated example of what's going awry)

wsmoses avatar May 01 '25 22:05 wsmoses

From reading the log... error is from this utility function:

https://github.com/FluxML/Flux.jl/blob/0e36af98f6fc5b7f3c95fe819a02172cfaaaf777/test/test_utils.jl#L38-L44

Seems to be called with 1st argument Zygote gradient, and 2nd argument Enzyme gradient:

https://github.com/FluxML/Flux.jl/blob/0e36af98f6fc5b7f3c95fe819a02172cfaaaf777/test/test_utils.jl#L109

And Zygote will often truncate branches which are like (nothing, nothing). I guess that's what's happening?

Signature in another error is other order... but it looks like Enzyme has left Tuple{Int64, Int64} in the gradient, where Zygote has nothing. Then in the next recursion inwards, you could get map(..., x::Tuple{Int64, Int64}, ys::Nothing) as above, I think?

map(f::Function, 
   t1::NTuple{6, KeyPath{Tuple{Int64, Symbol}}}, # who knows why, ignore
   t2::Tuple{Vector{Float32}, Tuple{Int64, Int64}, NTuple{4, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64},  # presumably Enzyme
   ts::Tuple{Vector{Float32}, Vararg{Nothing, 5}},  # presumably Zygote
)

mcabbott avatar May 06 '25 16:05 mcabbott

yeah Enzyme keeps the primal and shadow structures the same, so it sounds like the test works -- but the test function itself is busted here?

wsmoses avatar May 13 '25 03:05 wsmoses

Yes, that's right. Sorry I stopped before the conclusion!

Somehow the function for doing approximate equality on nested structs needs to be made smart enough to handle the difference between Zygote & Enzyme conventions.

mcabbott avatar May 13 '25 03:05 mcabbott

@mcabbott do you know what's needed to move the needle here?

wsmoses avatar Jul 14 '25 04:07 wsmoses

bumping again here @mcabbott @CarloLucibello

wsmoses avatar Oct 24 '25 13:10 wsmoses

Sorry I don't have bandwidth to think about this.

If anyone else does, trying to simplify gradient tests would be very welcome. Zygote may be on the way out, so far there appears to be nobody trying to make it run on 1.12. Possibly working on tests and on removing this would best be done together.

mcabbott avatar Oct 24 '25 13:10 mcabbott