Corey Lowman

Results 78 issues of Corey Lowman

There are a number of tensor_ops that try to detail the equivalent operation in pytorch in their docstring. Make all of these docstrings a consistent format to reduce confusion. _Originally...

Contrived example: ```rust fn main() { let mut model: (Linear, Linear) = Default::default(); let y = model.1.forward(Tensor1D::zeros().traced()); let gradients = y.square().mean().backward(); let mut opt: Sgd = Default::default(); opt.update(&mut model, gradients);...

I think this code be done by implementing `Linear` as a tuple instead: `(MatMul, Add)`, and then if someone doesn't want bias, they can just `MatMul` instead. However, this would...

refactor

_Originally posted by @coreylowman in https://github.com/coreylowman/dfdx/pull/94#discussion_r925515750_

There's a lot of work to be done here. Very rough list of todos: - Devices - [ ] Add `Cuda` device that wraps cudarc::CudaDevice and an rng - [...

new feature
refactor
gpu

A common feature to use when building training frameworks is to save the optimizer state along with network state. There's already a way to convert a `D::Vec` into a rust...

new feature

- [ ] Sgd `weight_decay: Option` - [ ] RMSprop `weight_decay: Option` - [ ] Adam/AdamW: `weight_decay: Option`, where `enum AdamDecay { Vanilla(f32), AdamW(f32) }`. (maybe use different terms for...

mean & variance should only be updated if the tape parameter actuall owns the tape. ie. `if TAPE::OWNS_TAPE` The paper is: https://arxiv.org/abs/1502.03167 More details can be found at pytorch documentation:...

Needed for #34 . In multi head attention, you concatenate the output of all the single attention heads. Note that this may require nightly access similar to #1 because we...

requires-nightly
feature(generic_const_exprs)