Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

Segfault during inference

It would be useful to know if there is a pattern to the segfault. Like it always reproduces on a certain file or file of a certain size. Another option...

mlx_whisper: add support for audio input from stdin

Thanks for the addition. What do you think about a couple modifications: - For piping from stdin use `-` as in `mlx_whisper -` . That is what we do in...

[CUDA] An alternative approach for waiting on CPU task in GPU stream

> one way to go is to cache and defer the waiting submission batches by ourselves until CPU signals past the waited value. So this calls for a deferred/pending submission...

[BUG] MLX nn.celu Operation Fails with g++-13 on Ubuntu 24.04

CC @acsweet

[DEPRECATION] MacOS 15 SDK marks some functions used by MLX as deprecated

Yea we've seen this. We use those functions for half-precision CPU gemms. We'll need to either switch to BNN graph (which is where they are going) or roll our own...

[Feature request] Support saving meta data to mlxfn files and reading them back

We could do something like what we do with safetensors / gguf. The metadata could be a simple dictionary of string to `Union[str, int, float]` or something like that. And...

Mlx generate text, by default halusonates more

Could you share the commands you used and the precise model paths? > Hardware: 40 GB A100 GPU. Also MLX is meant for Apple silicon.. is that a typo or...

Mlx generate text, by default halusonates more

How about the commands / model paths? Also when you say "by default halusonates more" what are you comparing against?

[BUG] Distributed inference OOMs on machines with different RAM size

So the problem here is the pipeline parallel is pretty dumb and assumes each machine has an equal amount of RAM. It divides the model evenly in three sections and...

[BUG] Distributed inference OOMs on machines with different RAM size

Yes MLX can do distributed inference directly using [mx.distributed](https://ml-explore.github.io/mlx/build/html/usage/distributed.html). RIght now, it's a lower level API than what you can do with Exo. So depends on what you want to...