Kyle Daruwalla comments

Results 404 comments of


                                            Kyle Daruwalla

Add support for lecun normal weight initialization

Could probably use the existing `truncated_normal` with the standard deviation set according to the fan in.

Add support for lecun normal weight initialization

In [`src/utils.jl`](https://github.com/FluxML/Flux.jl/blob/master/src/utils.jl) we already have a `Flux.truncated_normal` initializer function that accepts a custom standard deviation. A PR would add a new intializer, `Flux.lecun_normal`, that just calls the existing `truncated_normal` with...

Add support for lecun normal weight initialization

@chiral-carbon Please go ahead and open a PR if you are willing to tackle this.

Some small printing upgrades

We were discussing this during our call this week. The consensus was that using `IOContext(..., :limit => true)` is preferable to a custom `summary` for arrays. This would keep things...

Hoping to offer a version without cuda

Do you mean CUDA as in Nvidia's CUDA libraries or CUDA.jl, the Julia package? I make this distinction, because I believe CUDA.jl will only download Nvidia's library when the very...

Hoping to offer a version without cuda

> Something may have recently changed on the CUDA.jl end? I see this just installing it without Flux. You may want to bring it up on their side. Based on...

Hoping to offer a version without cuda

@ndgnuh It seems you can prevent downloading artifacts by specifying defining a file called `LocalPreferences.toml` in the root directory of your project: ```toml [CUDA_Runtime_jll] version = "local" ``` You could...

Backprop through time

There is no code directly in Flux.jl implementing BPTT. This is "just" calling `gradient` over the whole loop like Mike did in his first code snippet.

Backprop through time

Yes, the AD backend, Zygote, will handle gradient accumulation through the loop. See [this comment](https://github.com/FluxML/FluxTraining.jl/issues/144#issuecomment-1449078297) for how you can implement many to many or many to one models. Also check...

Pull request #2007 causes Flux.params() calls to not get cached

> we don't currently have nice helper functionality for that The issue to track for the helper is https://github.com/FluxML/Optimisers.jl/pull/57. Until then, here is a snippet that should do that same...