dfdx issues

Plot benchmark speed against pytorch

4

- [ ] Linear batched forward (matmul & broadcast add) - [ ] Backprop algorithm - [ ] Optimizer updates - [ ] forward with tape & without tape

coreylowman

documentation

Document differences from pytorch & other rust DL crates

2

E.g. - const generic sizes - clean/safe/hackable implementation - Gradients not stored directly on parameters, makes updating & writing optimizers much cleaner

coreylowman

documentation

v0.9.1 tracking issue

- [x] #158 - [ ] #157 - [ ] #150 - [ ] #149 - [x] #147 - [x] #146 - [ ] #121 - [x] #108 - [...

coreylowman

CUDA kernels JIT vs compile time compilation

4

Should CUDA kernels be JIT compiled at runtime or somehow compiled when the program is built? Best case we can support both of these easily via a feature flag or...

coreylowman

gpu

Fallible tensor ops

Related to #9 Since cuda operations can all fail, all the tensor operations/device level ops will need to return a result. I think this can be done in a non-breaking...

coreylowman

refactor

gpu

Add multi-threaded cpu device

1

This would likely use something like rayon under the hood to run operations inside a threadpool

coreylowman

Compiling rust closures to cuda kernel operations

Related to #9 This would mainly be used for element-wise operations, where the existing rust closures are specified at the element level. This method would be preferred over specifying both...

coreylowman

gpu

Apparently, channels last image format improves efficiency a lot (see pytorch recen blog https://pytorch.org/blog/tensor-memory-format-matters/). Need to investigate: 1. How much this helps convolutions 2. How confusing this would be for...

coreylowman

breaking-change

Can recursive modes to clean up device broadcast impl?

First used in #182 I think the Index, Recurse, Broadcast could be used directly for it

coreylowman

refactor

Revamp prelude/imports

1

- Remove certain things from prelude - make internal crate files not use blanket import (this makes it hard when browsing from non-editors to see where things come from)

coreylowman

refactor

breaking-change

dfdx
dfdx copied to clipboard

Metadata

Plot benchmark speed against pytorch

Document differences from pytorch & other rust DL crates

v0.9.1 tracking issue

CUDA kernels JIT vs compile time compilation

Fallible tensor ops

Add multi-threaded cpu device

Compiling rust closures to cuda kernel operations

Channels last convolutions?

Can recursive modes to clean up device broadcast impl?

Revamp prelude/imports

← Metadata

Owner

Metadata

dfdx dfdx copied to clipboard

Metadata

← Metadata

Owner

Metadata

dfdx
dfdx copied to clipboard