Corey Lowman
Corey Lowman
This will likely require it's own device function in the future, as I suspect this will be implement similarly to conv_forward where it's just looping over the inputs. Can this...
similar to pytorch's distributions module https://pytorch.org/docs/stable/distributions.html Main uses for this are reinforcement learning I think for sampling from output distributions. First pass should include something like a base `Distribution` trait...
Maybe add inference to each of the scripts as well to make it clear there is forward & forward_mut?
They should have the following behavior: 1. `Module::forward()` do nothing 2. `Module::forward_mut()` call dropout (i.e. if there is a tape, then it will modify, otherwise it will do nothing)
This can be done with rayon. Parallelizing forward is relatively straightforward, as you can just `result.par_iter_mut().zip(...).foreach()` since filters & bias are ref borrowed. Parallelizing backward is less striaghtforward, but easiest...
This came up in #67 and #107, and is also related to #101. It'd be nice for tensors in modules to have names that match how they are accessed. E.g....
Should have tuple input and outputs with hidden state.
Similar to how we can create a cpu tensor from a Vec, it'd be nice to be able to pass in a separately allocated CudaSlice. The new method of this...
Let's discuss how operator fusion might work in dfdx. I suspect it will require a lot of work. On cuda side of things it will at least require jit compiling...
With the addition of `AMP` dtype, we also need to add gradient scaling, which is commonly used with AMP training. I think the frontend interface could look something like: ```rust...