Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

Proposal: Add mypy to .pre-commit-config.yml

Sure! Just FYI `mlx_lm` has moved to a new repo https://github.com/ml-explore/mlx-lm I don't know how many are done.. not too many It hink.

Proposal: Add mypy to .pre-commit-config.yml

Some of those files are still in mlx-examples. The files in `tests/` should be in mlx-lm now.

iterate_batches in mlx_lm's Lora trainer is discarding the remainder dataset items (modulo batch size)

> The current implementation of iterate_batches produces batches for all but the remaining N items in the dataset (where N is less than the batch size). So, if your dataset...

[Feature] Export Lora Adapters as GGML

Can you say more about what you are looking for? Is it a separate GGUF file which contains the adapters. Then you can load the base model GGUF as well...

SPMStreamingDetokenizer sometimes outputs incorrect multi-byte characters

Well the streaming detokenizer and the naive tokenizer should give the same results. For now you can use the naive one until we fix the streaming one. It will be...

[Feature Request] Add support for logprobs to the mlx_lm server

The probabilities that gets returned is the probability of the given token at that time step. To get the log probabilities you would just take the log of it `mx.log(p)`.

[Feature Request] Add support for logprobs to the mlx_lm server

> I would be happy to contribute a PR, Thanks! It would be great to ease the path to evaluating models in MLX.

[Feature Request] Add support for logprobs to the mlx_lm server

> main difference is that OpenAI returns tokens as strings, while mlx-lm returns token ids It might be worth changing that on the mlx-lm side if that's the standard

Swift generates tokens substantially slower than python for Phi-3

> port StreamingDetokenizer In Python we have a [naive detokenizer](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/tokenizer_utils.py#L97-L105) that chops the history on every line break to avoid needing to re-decode the full sequence. That actually gets you...

Swift generates tokens substantially slower than python for Phi-3

Another optimization in Python which is really useful for long prompts/generations https://github.com/ml-explore/mlx-examples/pull/931 There are two things there 1. Prompt splitting 2. Rotating buffer for the cache The prompt splitting is...