Kyle Daruwalla
Kyle Daruwalla
Could probably use the existing `truncated_normal` with the standard deviation set according to the fan in.
In [`src/utils.jl`](https://github.com/FluxML/Flux.jl/blob/master/src/utils.jl) we already have a `Flux.truncated_normal` initializer function that accepts a custom standard deviation. A PR would add a new intializer, `Flux.lecun_normal`, that just calls the existing `truncated_normal` with...
@chiral-carbon Please go ahead and open a PR if you are willing to tackle this.
We were discussing this during our call this week. The consensus was that using `IOContext(..., :limit => true)` is preferable to a custom `summary` for arrays. This would keep things...
Do you mean CUDA as in Nvidia's CUDA libraries or CUDA.jl, the Julia package? I make this distinction, because I believe CUDA.jl will only download Nvidia's library when the very...
> Something may have recently changed on the CUDA.jl end? I see this just installing it without Flux. You may want to bring it up on their side. Based on...
@ndgnuh It seems you can prevent downloading artifacts by specifying defining a file called `LocalPreferences.toml` in the root directory of your project: ```toml [CUDA_Runtime_jll] version = "local" ``` You could...
There is no code directly in Flux.jl implementing BPTT. This is "just" calling `gradient` over the whole loop like Mike did in his first code snippet.
Yes, the AD backend, Zygote, will handle gradient accumulation through the loop. See [this comment](https://github.com/FluxML/FluxTraining.jl/issues/144#issuecomment-1449078297) for how you can implement many to many or many to one models. Also check...
> we don't currently have nice helper functionality for that The issue to track for the helper is https://github.com/FluxML/Optimisers.jl/pull/57. Until then, here is a snippet that should do that same...