Lux.jl
Lux.jl copied to clipboard
Improve `Julia & Lux for the uninitiated`
Hi, congrats on a very interesting package, I look forward to trying it out! I'm going through the docs and noticed some typos. I also recommend small potential improvements. I couldn't easily identify the original files to do a PR, so here they are:
In http://lux.csail.mit.edu/dev/examples/generated/beginner/Basics/main/
:
- we don't enfore it -> we don't enforce it
- We relu on the Julia StdLib -> We rely on the Julia StdLib
- we create an PRNG and seed it -> we create a PRNG (pseudorandom number generator) and seed (initialize) it
- we should use Lux.replicate on PRNG before using them -> we should use Lux.replicate on PRNGs before using them
- provides an uniform API -> provides a uniform API
- Note that AD.gradient will only work for scalar valued outputs -> Note that AD.gradient will only work for scalar valued outputs. (period at the end.)
- to demonstrate Lux let us use the Dense layer. -> to demonstrate Lux, let's use the Dense layer. (Equivalent to Pytorch's
nn.Linear
)
In the same page, I recommend adding a line to make the following a bit more "user-friendly", e.g. for Pytorch users curious about Julia+Lux:
-
∇f(x) = x
:- add underneath:
"∇" can be typed by \del<tab> in the Julia REPL or in a Julia-compatible editor. You can press ? in the REPL to enter Julia *help* mode, and, then paste the ∇, to find out how to type any unicode character in Julia.
- add underneath:
-
For updating our parameters let's use [Optimisers.jl](https://github.com/FluxML/Optimisers.jl)
->To update our parameters, let's use from [Optimisers.jl](https://github.com/FluxML/Optimisers.jl) an SGD (Stochastic Gradient Descent) with learning rate set to 0.01:
-
Initialize the initial state of the optimiser
->Setup the initial state of the optimiser:
-
Define the loss function
->Define the loss function:
-
println("Loss Value with ground true W & b: ", mse(W, b, x_samples, y_samples))
->println("Loss value evaluated with true parameters (weights and biases): ", mse(W, b, x_samples, y_samples))
-
# Perform parameter update
-># Update model's parameters:
IMHO the Jacobian-Vector Product
and the Vector-Jacobian Product
sections are technical details that's unlikely to be of interest to most people first looking at the docs... I recommend moving those section at the bottom of that page, or at least prefacing it with a "side-note: " so people can skip it.
Thanks for the pointers. The file is here https://github.com/avik-pal/Lux.jl/tree/main/examples/Basics. (Sometime I will get around to writing how the docs are built to help contributors).
I agree with all but one change. The ∇f(x)
needs to be updated to df(x)
instead in-line with the style guide for lux.