Awni Hannun

Results 1014 comments of Awni Hannun

It would be useful to know if there is a pattern to the segfault. Like it always reproduces on a certain file or file of a certain size. Another option...

Thanks for the addition. What do you think about a couple modifications: - For piping from stdin use `-` as in `mlx_whisper -` . That is what we do in...

> one way to go is to cache and defer the waiting submission batches by ourselves until CPU signals past the waited value. So this calls for a deferred/pending submission...

Yea we've seen this. We use those functions for half-precision CPU gemms. We'll need to either switch to BNN graph (which is where they are going) or roll our own...

We could do something like what we do with safetensors / gguf. The metadata could be a simple dictionary of string to `Union[str, int, float]` or something like that. And...

Could you share the commands you used and the precise model paths? > Hardware: 40 GB A100 GPU. Also MLX is meant for Apple silicon.. is that a typo or...

How about the commands / model paths? Also when you say "by default halusonates more" what are you comparing against?

So the problem here is the pipeline parallel is pretty dumb and assumes each machine has an equal amount of RAM. It divides the model evenly in three sections and...

Yes MLX can do distributed inference directly using [mx.distributed](https://ml-explore.github.io/mlx/build/html/usage/distributed.html). RIght now, it's a lower level API than what you can do with Exo. So depends on what you want to...