Awni Hannun
Awni Hannun
Sure! Just FYI `mlx_lm` has moved to a new repo https://github.com/ml-explore/mlx-lm I don't know how many are done.. not too many It hink.
Some of those files are still in mlx-examples. The files in `tests/` should be in mlx-lm now.
> The current implementation of iterate_batches produces batches for all but the remaining N items in the dataset (where N is less than the batch size). So, if your dataset...
Can you say more about what you are looking for? Is it a separate GGUF file which contains the adapters. Then you can load the base model GGUF as well...
Well the streaming detokenizer and the naive tokenizer should give the same results. For now you can use the naive one until we fix the streaming one. It will be...
The probabilities that gets returned is the probability of the given token at that time step. To get the log probabilities you would just take the log of it `mx.log(p)`.
> I would be happy to contribute a PR, Thanks! It would be great to ease the path to evaluating models in MLX.
> main difference is that OpenAI returns tokens as strings, while mlx-lm returns token ids It might be worth changing that on the mlx-lm side if that's the standard
> port StreamingDetokenizer In Python we have a [naive detokenizer](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/tokenizer_utils.py#L97-L105) that chops the history on every line break to avoid needing to re-decode the full sequence. That actually gets you...
Another optimization in Python which is really useful for long prompts/generations https://github.com/ml-explore/mlx-examples/pull/931 There are two things there 1. Prompt splitting 2. Rotating buffer for the cache The prompt splitting is...