ADTypes.jl icon indicating copy to clipboard operation
ADTypes.jl copied to clipboard

More parameters for `AutoEnzyme`

Open gdalle opened this issue 5 months ago • 19 comments

I think it would make sense to add more parameters to AutoEnzyme, like:

  • deferred = true / false to use autodiff_deferred instead of autodiff
  • chunksize to imitate the AutoForwardDiff API and allow selecting a chunk size for Jacobians and Hessians

@wsmoses @vchuravy what do you think?

Related:

  • https://github.com/JuliaDiff/DifferentiationInterface.jl/issues/819#issuecomment-3070516150

gdalle avatar Jul 23 '25 05:07 gdalle

deferred should not be needed. Enzyme does this automatically now and it is just needed for directly using it in GPU kernels.

vchuravy avatar Jul 23 '25 08:07 vchuravy

gentle bump on this @gdalle since it came up in the linked thread. would be great to have

wsmoses avatar Aug 15 '25 15:08 wsmoses

I would love some input on the options we'd like there. ForwardDiff has two:

  • chunk size=nothing for automatic selection, upper-bounded by 12 for bandwidth reasons
  • chunk size::Integer for fixed manual selection

Does Enzyme need a third option which always takes the batch size to be the length of the vector (aka computing the whole Jacobian in a single batch for example)?

gdalle avatar Aug 15 '25 15:08 gdalle

Related:

  • https://github.com/EnzymeAD/Enzyme.jl/issues/1542
  • https://github.com/EnzymeAD/Enzyme.jl/pull/1545

gdalle avatar Aug 15 '25 15:08 gdalle

yeah the "same size as vector" does have performance benefits, so it would be nice to represent

wsmoses avatar Aug 15 '25 15:08 wsmoses

I was thinking we could use chunksize=Inf for that one. Sure, Inf is technically a Float but typemax(Int) is much less readable and Inf gets the point across rather well. How does that sound?

gdalle avatar Aug 15 '25 15:08 gdalle

I guess the question becomes what is the meaning if someone gives a float != Inf. Ideally we would restrict the type to only contain valid inputs.

Maybe a new type class?

wsmoses avatar Aug 15 '25 15:08 wsmoses

We can check type parameters at construction with an inner constructor:

struct AutoEnzyme{M,A,chunksize}
    mode::M
    function AutoEnzyme{M,A,chunksize}(mode::M) where {M,A,chunksize}
        if chunksize isa Integer
            @assert chunksize > 0
        elseif chunksize isa Float
            @assert chunksize == Inf
        else
            @assert isnothing(chunksize)
        end
    end
    return new{M,A,chunksize}(mode)
end

gdalle avatar Aug 15 '25 15:08 gdalle

Is that cleaner than say AutoEnzyme(ForwardMode, FullBatch), where FullBatch is a new struct? (I'm not sure, genuinely asking this as a design question)

wsmoses avatar Aug 15 '25 16:08 wsmoses

I'd say that with ForwardDiff, ADTypes has already started a tradition of using built-in language values to represent specific batching behaviors. For instance, chunksize=nothing really means chunksize=AutoChunk(), in the same way that chunksize=Inf would mean chunksize=FullChunk(). Unfortunately, we can't change the ForwardDiff API without a breaking release, so I would suggest we continue down that path for the time being, but keep this change in mind for a future ADTypes v2.0?

gdalle avatar Aug 15 '25 16:08 gdalle

I mean we could equally put FullChunk/etc in EnzymeCore proper [like mode]

wsmoses avatar Aug 15 '25 16:08 wsmoses

Down the road we'll need the same chunking options for Mooncake too (https://github.com/chalk-lab/Mooncake.jl/discussions/533), and we already have them for ForwardDiff. I think it would be nice for users if these three were handled in the same way?

gdalle avatar Aug 15 '25 16:08 gdalle

perhaps, though per https://github.com/SciML/ADTypes.jl/issues/123 users already need to import EnzymeCore.Forward anyways atm

wsmoses avatar Aug 15 '25 16:08 wsmoses

I'm not sure how to get around that one, because Enzyme's mode objects are rather sophisticated. For Mooncake I'll probably do AutoMooncake(; mode=ADTypes.ForwardMode()) and call it a day. Maybe we could add such a shortcut for Enzyme as well, but it wouldn't incorporate things like runtime activity, holomorphy, etc.

gdalle avatar Aug 15 '25 16:08 gdalle

honestly beyond mode, and batch, I think the only one that users really care about is runtime activity (x/ref https://github.com/SciML/ADTypes.jl/issues/85#issuecomment-2374897752 / https://github.com/SciML/ADTypes.jl/issues/113 ), so if one needs more than that they can use mode, if not ADTypes can construct the mode itself

wsmoses avatar Aug 15 '25 16:08 wsmoses

Would a constructor like this one look good to you?

AutoEnzyme(;
    mode::Union{Enzyme.Mode,ADTypes.AbstractMode},
    function_annotation=nothing,
    runtime_activity=false,
    chunksize::Union{Nothing,Int,Float}=nothing,
)

When ADTypes.ForwardMode() or ADTypes.ReverseMode() is provided, then the custom runtime_activity kwarg is used to construct the Enzyme object. The remaining question would be: what if an Enzyme mode is passed with runtime activity set to true, but the kwarg is set to false?

gdalle avatar Aug 15 '25 16:08 gdalle

maybe have runtime_activity::Union{Nothing,Bool}=nothing and if non-nothing it overrides the mode?

wsmoses avatar Aug 15 '25 16:08 wsmoses

I'd be okay with that. PR incoming in the next few days

gdalle avatar Aug 15 '25 16:08 gdalle

Following up on this, should we add deferred to AutoEnzyme or not? Otherwise #124 is ready, except for the corresponding implementation in Enzyme (https://github.com/EnzymeAD/Enzyme.jl/pull/2659)

gdalle avatar Nov 13 '25 06:11 gdalle