Kyle Daruwalla comments

Results 404 comments of


                                            Kyle Daruwalla

how to selectively take structural gradient

There are two pieces here: (1) not updating non-trainable parameters, and (2) not computing gradients for non-trainable parameters. For (1), Optimisers.jl uses Functors.jl to walk over the structure and the...

how to selectively take structural gradient

> blessing and curse of Zygote supporting differentiation of arbitrary structs Right, PyTorch autograd's `requires_grad` does (2), but it also prevents PyTorch's layers from being as flexible as ours. I...

Incremental accumulation of gradients?

Did you mean for your PyTorch code to be `y = loss(net(x))`? Regardless, I can see why Zygote might OOM here and PyTorch would not.

Incremental accumulation of gradients?

This is a useful Zygote issue, but just to address the larger issue: could you try replacing your `DenseNet` implementation with `SkipConnection`s? This would avoid the `xs = (xs..., x_new)`...

Incremental accumulation of gradients?

You could use [this](https://github.com/darsnack/Metalhead.jl/blob/darsnack/vision-refactor/src/densenet.jl) as a reference. It does get a bit more complicated, though it looks like you won't need a bunch of the extra code in there like...

Incremental accumulation of gradients?

> The trouble with the concatenation formulation is, that it produces a big tensor at each bottleneck. This means the memory needed to record a forward pass is quadratic in...

Very slow first-time gradient calculation

Another coarse grained data point w.r.t. to timeline on these regressions. Gradient tests [were disabled](https://github.com/FluxML/Metalhead.jl/pull/70#pullrequestreview-718321736) on the initial refactor of Metalhead.jl due to the long compile times. That was as...

Kyle Daruwalla

how to selectively take structural gradient

how to selectively take structural gradient

Incremental accumulation of gradients?

Incremental accumulation of gradients?

Incremental accumulation of gradients?

Incremental accumulation of gradients?

Very slow first-time gradient calculation

Very slow first-time gradient calculation

Add character and word level tokenizers

Support batch-level transformations in `Encoding`s