Alex Cheema

Results 117 issues of Alex Cheema

- Tokens per second - Time to first token - In-flight requests - List of all models downloaded already

The limits set in `configure_mlx.sh` should be dynamic.

Add support for this model with the MLX backend: https://huggingface.co/black-forest-labs/FLUX.1-dev There's already an example of FLUX using MLX here: https://github.com/ml-explore/mlx-examples/blob/main/flux/flux/model.py

Just testing.

How to reproduce: - Run a cluster on 2 nodes - Run a request - Restart one node - Run another request ``` Traceback (most recent call last): File "/Users/alex/exo/exo/api/chatgpt_api.py",...

- Make the TUI updates asynchronous. Probably just a framerate e.g. once per second - The TUI shouldn't update every time a node is active - we should just show...