Kyle Daruwalla

Results 404 comments of Kyle Daruwalla

There are two pieces here: (1) not updating non-trainable parameters, and (2) not computing gradients for non-trainable parameters. For (1), Optimisers.jl uses Functors.jl to walk over the structure and the...

> blessing and curse of Zygote supporting differentiation of arbitrary structs Right, PyTorch autograd's `requires_grad` does (2), but it also prevents PyTorch's layers from being as flexible as ours. I...

Did you mean for your PyTorch code to be `y = loss(net(x))`? Regardless, I can see why Zygote might OOM here and PyTorch would not.

This is a useful Zygote issue, but just to address the larger issue: could you try replacing your `DenseNet` implementation with `SkipConnection`s? This would avoid the `xs = (xs..., x_new)`...

You could use [this](https://github.com/darsnack/Metalhead.jl/blob/darsnack/vision-refactor/src/densenet.jl) as a reference. It does get a bit more complicated, though it looks like you won't need a bunch of the extra code in there like...

> The trouble with the concatenation formulation is, that it produces a big tensor at each bottleneck. This means the memory needed to record a forward pass is quadratic in...

Another coarse grained data point w.r.t. to timeline on these regressions. Gradient tests [were disabled](https://github.com/FluxML/Metalhead.jl/pull/70#pullrequestreview-718321736) on the initial refactor of Metalhead.jl due to the long compile times. That was as...

Anything we can do here?[^1] This effects nearly every model in Metalhead.jl significantly...which basically means a horrible TTFG experience for anyone doing very basic ML stuff in Flux. I was...

No need to self-assign the issue. Just submit a PR when ready. By conform, I mean defining a new `getobs`/`nobs` on the tokenizer type to call the underlying splitting methods....

No issues here with the proposed API. Typically, in FastAI, we have a "batch of images" or a "batch of tabular entries." Similarly, here we have a "batch of sequences."...