Corey Lowman

Results 78 issues of Corey Lowman

Often, it is recommend to loop over items in a cuda kernel like: ```c++ for (unsigned int i = tid; i < n; i += blockDim.x * gridDim.x) { ......

gpu
optimization
expert

With #767 now added, we can integration test against more complex image networks! This new test should mirror the existing resnet18 integration test, but with a mobile net structure.

test

Related to #696

new feature

Related to #672 Tensors could be backed by allocations that have more space than necessary. BTreeMap already has a method to return keys within a certain range that would make...

optimization
expert

Currently we run the lhs & rhs kernels on separate streams, but due to kernel occupancy, they can't actually run in parallel: ![image](https://user-images.githubusercontent.com/7787278/231512269-83ad03f5-66f7-493c-ac4a-37621e2e7d65.png) Investigate ways to make these run in...

gpu
optimization

I think it might be possible to do inference forwards without any allocations. This would require the following: 1. Need to be able to compute the max output size of...

ideation
optimization

Currently you can only concatenate two tensors with support for different shapes in both tensors. When all the tensors in the concat operation have the same shape, we should also...

expert

If the tensor has a NoneTape, it can only accept 1 closure, but if the tensor has a OwnedTape, it should accept both the f closure and the derivative closure...

new feature

Often times when neural networks output labels, we want to get the top K labels. This will be useful for validation & deployment. UX should look something like: ```rust let...

As brought up in reddit thread, a OpenCL device would be useful to support folks with AMD gpus. Here are roughly the tasks that need to happen: 1. [ ]...

new feature
help wanted
gpu
expert