Flux Compatibility
You may have noticed that I recently overhauled Flux's AD to emphasise a more functional API. My not-so-secret goal here is that Flux's current AD will be forwards-compatible with the better infrastructure that we are building; it should be easy to swap Capstan into an existing model without any code changes.
For the most part, this isn't any of Capstan's problem. But we do need Capstan to be able to strip away Flux's tracking before applying its own. Capstan already does checks whenever a new value is introduced to the program (for the wrt! API), so this should be as simple as applying a function like unwrap just after that –
unwrap(x) = x
# In Flux
unwrap(x::TrackedArray) = x.data
We can use this issue to discuss any other compatibility issues that might come up; it might be useful to start having some Flux examples, and I'm happy to help get that set up.
It looks like Capstan needs to write the backpropagation for the custom layer, and that Flux doesn't. Am I right? If this is the case, then I think it would be quite delicate to write complex layer for some experiment.
No, Capstan will be able to support custom layers just like Flux's current AD. Otherwise, it would not really be automatic differentiation :)