Fred Bliss

Results 61 comments of Fred Bliss

Per comments on the hugging face repo, the differences between the two tokenizers.json files are unicode differences. I'll assume I've got something bugging on my end unless anyone else sees...

> ``` > # Libraries > from transformers import AutoTokenizer > import mlx.core as mx > import mlx_lm > from mlx_lm.utils import load_model, get_model_path > > > # Language Model...

> > Copying the original cohere tokenizer.json (https://huggingface.co/CohereForAI/c4ai-command-r-plus/blob/main/tokenizer.json) fixes this issue completely from my testing (output generation is slow, but so far so good!) > > My guess is something...

On a related note, @1b5d - any plans to incorporate gptq into this? would love a lighter weight API / langchain integration for gpu inference.

even better, MLX support? :)

> They are just different approaches really. Pipelining gives perfect scaling in throughput but not latency. > > This means that if you are running evaluations or simply running batch...

Think I figured out what's happening, and it's only in the base model, not the instruct variants. Have no idea if this is an issue to handle, though - or...

If this is something worth putting in a check for (cases where BOS and EOS tokens are the same), without changing the base tokenizer behavior, this is the simplest I...

fwiw - I added mlx-lm as a provider. it's barebones but it works. https://github.com/fblissjr/goose-mlx goose configure & session: ![Image](https://github.com/user-attachments/assets/c44345b9-af55-4258-8c33-177dedd78849) mlx_lm.server log: ![Image](https://github.com/user-attachments/assets/e6237b12-d807-4ea4-87f6-326cca886168)

@Mihaiii Not sure if this still works with how `mlx-lm` evolved (for the better!), but I needed this awhile back and put this here: https://github.com/ml-explore/mlx-examples/pull/806#issuecomment-2211931951