Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

Do not join threads during process exit on Windows

Your solutions sounds reasonable: - Force the order for CUDA - IFDEF on heap for windows

[WIP] CUDA backend

Awesome progress so far @zcbenz !! I'm wondering what the best way to get this incorporated into MLX. I can think of a couple of options: - Once this is...

[WIP] CUDA backend

> This comes with a limitation of maximum ndim in arrays, which PyTorch sets to 25, I'm using 8 for now and it can be easily changed if found not...

> some of them are the slow `Event::is_signaled` calls that need to be improved. Where are those calls coming from? Is it [here](https://github.com/ml-explore/mlx/blob/main/mlx/transforms.cpp#L206-L207)? We might be able to reduce the...

[WIP] CUDA backend

Very nice @zcbenz ! > To get rid of this latency, I improved the CUDA backend by saving operands and temporaries of the op until finalize() is called, i.e. when...

Scan ops do not support complex types yet

Hmm, that's the gradient of product which I believe uses cumprod. This is basically a duplicate of https://github.com/ml-explore/mlx/issues/673 which is that scan ops don't currently work on 64-bit types. We...

Context window for LLM

In case it's useful here is a [reference implementation](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/examples/chat.py#L12-L42) in the Python version of MLX LM.

Context window for LLM

The length of the cache (in tokens) + any generated text should stay below the maximum context size of the model. It's not checked in the Python version though as...

Adding support for the Muon Optimizer

> I do wish there were an easier way to delegate the parameters we'd want to use with Muon and others with AdamW Can you say more about that?

Adding support for the Muon Optimizer

Thanks for the detailed explanation, that makes sense!