Corey Lowman issues

Results 78 issues of


Corey Lowman

Ability to one_hot_encode 2d arrays/vecs of usizes

Currently only support 1d arrays/vecs

new feature

Optional calculation of input gradients for Convolutions

pytorch doesn't actually calculate input gradient of Conv2d unless `requires_grad==True`, whereas dfdx currently always updates the input gradient. This is actually a giant speed boost, if you comment out the...

ideation

Keep shared broadcasted dimensions with binary operations

This is an optimization that further minimizes the amount of data needed for certain cases of binary operations. For example if you add two tensors that both have the 0th...

Add `tensor_ops::softplus` and `nn::SoftPlus`

See pytorch docs https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html#torch.nn.Softplus Will need to figure out what the derivative of this function is, but that's probably online somewhere. See https://github.com/coreylowman/dfdx/pull/397 for all the pieces you'll need to...

Add `tensor_ops::hardswish` and `nn::Hardswish`

See pytorch docs https://pytorch.org/docs/stable/generated/torch.nn.Hardswish.html#torch.nn.Hardswish Will need to figure out what the derivative of this function is, but that's probably online somewhere. See https://github.com/coreylowman/dfdx/pull/397 for all the pieces you'll need to...

Update AddInto to work with a single tape on the first input

See thread in https://github.com/coreylowman/dfdx/pull/256#discussion_r1008886075 Currently AddInto only properly records gradients if all inputs have a tape on them. However now that add will merge tapes, it would be nice if...

Add CUDA 12.4 (#212)

TODO: - [ ] Generate bindings for cudnn - [ ] Genearte bindings for nccl

Use pinned memory for doing transfers

Is there a way to avoid the thread lock in cuda driver?

> Okay, after too much time investigating. It seems the Cuda Driver (not NCCL) is using a global MUTEX which makes multithread/multigpu quite useless. https://forums.developer.nvidia.com/t/cuda-wont-concurrently-run-kernels-on-multiple-devices-from-within-same-process/240388 https://forums.developer.nvidia.com/t/multithreaded-tensorrt-performance-drops-dramatically/184882/8 https://forums.developer.nvidia.com/t/cuda-introduces-heavy-locks/61357 > > Basically...

Possible to use #![feature(abi_ptx)]?

Using https://doc.rust-lang.org/unstable-book/language-features/abi-ptx.html, you can compile rust functions to PTX on nightly compiler. How can we use this with cudarc?