Brian Chen
Brian Chen
But wait, there's more! https://github.com/JuliaAI/StatisticalTraits.jl exists as well. We should try to get folks from each org on an ML call at some point to work all this out...
I don't even think Flux uses `normalise` from MLUtils (it still has its own copy, which should itself be moved to NNlib), so there may not be much to fixup...
The Flux function ought to be renamed to something, anything else. The existing name is just confusing. We can probably discuss that over in NNlib though.
I think my worries with a `cond`-like construct are twofold: 1. To avoid the `within_gradient` problem, it seems like you'd need specialized `cond`s for every kind of branch that might...
> 1. Wouldn't it be just `cond(somecondition && othercondition, () -> ..., () -> ...)`? Also, if you worry about compilation time due to multiple specializations, I believe `@nospecialize` should...
> Do you assume some other setup? My understanding of `cond(flag, true_fn, false_fn)` is that it obeys the following truth table: |differentiating?|flag|branch taken| |-|-|-| |T|T|true_fn| |T|F|false_fn| |F|T|false_fn| |F|F|false_fn| In other...
I'm not sure I understand how this tracing works then. To get a value for `active`, you ultimately need to call something which returns `true` when not differentiating and `false`...
> That's the point - `active` and differentiation are **independent**. > _Usually_, people set `active = true` while training and differentiate code while training, but strictly speaking nobody forbids you...
As a start, can we completely ignore the `*2onnx` functionality? Ideally there could also be some automation of the generation of the `onnx2*` functions as well. Instead of using Flux...
The lack of clarity on how this layer should behave suggest to me that we might not want it as a built-in. Now, the good news is that Flux doesn't...