luminal issues

Whisper

Whisper large v3 should be able to run in realtime on M1 Pro

jafioti

enhancement

Benchmarking

1

We should have a GH action to benchmark performance and compare against PyTorch numbers to find any regressions

jafioti

testing

Quantization

1

- [x] 8 bit - [ ] 4 bit - [ ] 2 bit? (Check if accuracy falls)

jafioti

testing

Aggressive elementwise fusion

1

Currently the elementwise fusion is very conservative in what it fuses. It can be a lot more aggressive by: - Fusing constants into kernels - Fusing across shape changes and...

jafioti

How to add new backends/compilers?

1

Hi, great project! I'd like to add a new backend/compiler. Is there a step-by-step guide for this to make sure I don't forget anything?

raphaelDkhn

Better memory allocation

1

https://arxiv.org/pdf/2001.03288.pdf

jafioti

Cost model

2

Hi @jafoti, nice project! I did something somewhat similar a few years ago in Scala. I skimmed a little bit through the project, so I only have a superficial understanding....

nightscape

[feature suggestion] self speculative decoding

7

Good morning(or afternoon/ evening)! There is a methodology called **self speculative decoding** among the techniques to enhance the speed of LLM inference. Would it be possible to implement this feature...

NewBornRustacean

luminal
luminal copied to clipboard

Metadata

FlashAttention

Whisper

Yolo v8

Benchmarking

Quantization

Aggressive elementwise fusion

How to add new backends/compilers?

Better memory allocation

Cost model

[feature suggestion] self speculative decoding

← Metadata

Owner

Metadata

luminal luminal copied to clipboard

Metadata

← Metadata

Owner

Metadata

luminal
luminal copied to clipboard