luminal icon indicating copy to clipboard operation
luminal copied to clipboard

Deep learning at the speed of light.

Results 35 luminal issues
Sort by recently updated
recently updated
newest added

Currently we use cublas through the cudarc library for matmuls, which is very good for generic matmul performance. cuDNN has many more NN specific ops that can be leveraged specifically...

Currently the graph selection api makes it difficult to write selectors for complex patterns like Rope: https://github.com/jafioti/luminal/blob/cb07523f02845e5a78b49f4c1fbf3f0705709ea9/crates/luminal_metal/src/unary.rs#L1278 Selectors should be built similarly to how primgraphs are already built, with a...

ONNX graphs should be able to be converted to luminal primgraphs. Only tricky part is getting GraphTensors out once the graph is converted, for inputting data and getting output. This...

good first issue

3 main goals exist for 0.3: 1) SOTA performance on metal and near-SOTA on cuda + cpu for transformers 2) Wide range of models (Whisper, Diffusion models, Yolo, etc.) 3)...

test_max fails in metal because of this I believe. Looking at the generated kernel, it doesn't seem correct: ``` #include using namespace metal; kernel void mkernel(device float* input1 [[buffer(1)]], device...

Currently tests are hand written and have bad coverage. We want to generate tests that can cover the entire codebase with many permutations. Should probably be done in a macro?

testing

Lots of unit tests are written for luminal_cuda and luminal_metal, but they aren't ran by the CI. Need to figure out how to requisition a gpu machine to run them

Any plans on supporting 3D Convolutions ? They are extremely important in the medical imaging community.

- Remove sqrt by doing x.pow(0.5) This requires selectors to reference the same node multiple times. - Also remove mul, mul is just a.log2().add(b.log2()).exp2(), and div is just a.log2().sub(b.log2()).exp2() -...

advanced

It should be possible to write an autograd compiler that runs on a primgraph, and derives a backward graph and attaches it to the end of the main graph. With...

advanced