Bijectors.jl icon indicating copy to clipboard operation
Bijectors.jl copied to clipboard

Implementing a custom bijector is a hassle: solve by adding macro?

Open torfjelde opened this issue 5 years ago • 3 comments

Currently there a couple of annoyances when implementing a new Bijector:

  1. Difficult to share implementation between bijectors, mostly because of the fact that callable types (b::Bijector)(x::T) cannot be implemented for abstract types on Julia <1.3. This means that we have to implement batch-computations on a case-by-case basis, which is both annoying and sometimes difficult to do in a AD-friendly + type-stable way (we have a bunch of mapvcat and eachcolmaphcat methods to do this, which is an unnecessary complication for a newcomer).
  • When we started out with the re-design, we were also considering using a transform(b, x) method as the "evaluation" method, as this would allow us to have more generic implementations for batching, etc. But we decided not to do that, as it also felt clunky.
  1. forward(b::Bijector, x) is supposed to allow the user to share computation between the evaluation, i.e. (b::MyBijector)(x), and logabsdetjac(b, x). Buuut it's annoying to have to first implement (b::MyBijector)(x) and logabsdetjac(b, x), which are mandatory, and then have to go through these methods to figure out what is shared and then copy-paste certain parts to a transform method, etc.

Since we're in Julia, my first idea is of course to throw a macro at problem! I'm thinking introduce transfrom but make it -super-easy for the user to define everything in one go. I.e. something along the lines of:

struct MyBijector <: Bijector{0} end

@bijector function forward(b::MyBijector, x::Real)
    # Shared computation
    z = exp(x)
    
    rv = begin
        # `(b::Bijector)(x)` goes here
        z
    end
    logabsdetjac = begin
        # `logabsdetjac(b, x)` goes here
        log(z)
    end
end

which is then transformed into something along the lines of

quote
    function (Bijectors).transform(b::MyBijector, x::Real)
        z = exp(x)
        return z
    end
    (b::MyBijector)(x) = (Bijectors).transform(b::MyBijector, x::Real)
    function (Bijectors).logabsdetjac(b::MyBijector, x::Real)
        z = exp(x)
        return log(z)
    end
    function (Bijectors).forward(b::MyBijector, x::Real)
        z = exp(x)
        rv = z
        logabsdetjac = log(z)
        return (rv = rv, logabsdetjac = logabsdetjac)
    end
end

Then the only thing that is left for the user to implement is the inverse evaluation.

Also, I do have a somewhat "dirty" implementation ready (from which the above output was generated + MacroTools.prettify): https://gist.github.com/torfjelde/8675bba686afdf693476ae1c70f516d3.

This would then allow us to easily transition to transform, thus ensuring compatibility with Julia <1.3 but still using more generic methods, i.e. transform(b::Bijector{0}, x::AbstractVector) = b.(x). It would make it super-easy to share computation in forward. Finally, we could start thinking about adding in complementary inplace methods, e.g. transform!(b::Bijector, x, out), logabsdetjac!(b::Bijector, x, out), etc, as a next step.

The only question is: are we overcomplicating things here? Is there an easier way of achieving what we want?

torfjelde avatar Sep 12 '20 10:09 torfjelde

IMO a macro seems to complicated and leads to un-julian syntax. I also don't think it is necessarily easier for users to figure out how to write the macro than just implementing the three functions currently. I'm not sure if forward is part of the API or just an implementation detail, and how often it happens that . In any case, I think maybe the following could work:

  • Define all methods that apply bijectors to multiple inputs (i.e., vectors, matrices, etc.) generically for all bijectors since we do not support Julia < 1.3 anymore.
  • Use something like @thunk in ChainRulesCore (see https://www.juliadiff.org/ChainRulesCore.jl/dev/writing_good_rules.html#Use-Thunks-appropriately and https://www.juliadiff.org/ChainRulesCore.jl/dev/api.html#ChainRulesCore.Thunk) to implement the following default:
(f::Bijector)(x) = unthunk(forward(f, x).rv)
logabsdetjac(f::Bijector, x) = unthunk(forward(f, x).logabsdetjac)

(BTW I'm not sure about these names, maybe just make it a tuple or use something else than rv?)

  • Let users (mainly) implement
function Bijectors.forward(f::MyCoolBijector, x)
	....
	return (rv = ..., logabsdetjac = ...)
end

possibly using @thunk

  • Add implementations of a reduced logabsdetjac version for and loglikelihood_with_trans (see https://github.com/TuringLang/Bijectors.jl/issues/120) that avoid the intermediate allocations of arrays in logabsdetjac

devmotion avatar Sep 12 '20 11:09 devmotion

IMO a macro seems to complicated and leads to un-julian syntax.

But the "un-Julian syntax" is mostly due to the fact that we're dropping the return statement, right? If so, we could just make the user add it manually, i.e. return (rv = rv, logabsdetjac = logabsdetjac), or make this optional. Other than that, there's not much un-julian about the macro, IMO.

I also don't think it is necessarily easier for users to figure out how to write the macro than just implementing the three functions currently.

That's true but the goal here isn't to make it easier to understand, but easier to go from "I want this bijector" to "I have this bijector". Using a macro we could make it so that there is a minimal amount of work on the user, in addition to getting the most efficient implementation for all the necessary functions. E.g. RadialLayer and PlanarLayer would require waaay less code in addition to being more efficient than the current implementation (the fact that these unnecessarily compute logabsdetjac in the (b::Bijector)(x) method, kind of proves the point that people aren't bothered to implement all the different methods, haha).

I'm not sure if forward is part of the API or just an implementation detail, and how often it happens that .

It's part of the API :+1: And there are cases where it's definitively worth it, e.g. RadialLayer for high-dimensional input. It's not particularly useful for stuff like Exp, but once you start working with 500-dimensional normalizing flows this becomes very important.

Define all methods that apply bijectors to multiple inputs (i.e., vectors, matrices, etc.) generically for all bijectors since we do not support Julia < 1.3 anymore.

I'm potentially for this. But it's worth noting that Bijectors.jl still works for Julia <1.3, it's just that we don't test properly + certain AD-backends doesn't work. This introduction would completely break Bijectors.jl for Julia <1.3.

Use something like @thunk in ChainRulesCore

You're thinking along the lines of

function _forward(b, x)
    rv = @thunk ...
    logabsdetjac = @thunk ...
    return (rv = rv, logabsdetjac = logabsdetjac)
end

forward(b, x) = unthunk(_forward(b, x))
(b::Bijector)(x) = unthunk(_forward(b, x).rv)
logabsdetjac(b, x) = unthunk(_forward(b, x).logabsdetjac)

right?

I'd argue that this is both a) more complicated to understand for the user, b) way worse performance as closures have comparatively significant overhead.

Add implementations of a reduced logabsdetjac version for and loglikelihood_with_trans (see #120) that avoid the intermediate allocations of arrays in logabsdetjac

I'm in favour of the suggestion, but it seems like a slightly different issue, no?

torfjelde avatar Sep 12 '20 11:09 torfjelde

(BTW I'm not sure about these names, maybe just make it a tuple or use something else than rv?)

Agree, but also separate issue. We discussed renaming rv to res (https://github.com/TuringLang/Bijectors.jl/issues/41) but the issue sort of lost it's momentum and we ended up getting stuck with it.. And I'm not certain about just making it a tuple, since NamedTuple means you can access it both using indexing and .varname.

torfjelde avatar Sep 12 '20 12:09 torfjelde