Prince Canuma

Results 151 comments of Prince Canuma

> Hey @Blaizzy, I have run the exact same test with the new llama.cpp implementation of Command-R+ and it works way above 8k tokens. @jeanromainroy can you try again with...

You can also try to increase the default `max_position_embeddings` and let me know if it works.

Let me know how it goes, but for now according to your report the issue should be fixed.

> Hey @Blaizzy , I tried your fork and the model is still outputting ... when I provide a long prompt. I have made a new change, can you try...

Wait, I think I got it! Give me 30 min :)

@jeanromainroy can you try this branch, the previous one had a git issue: https://github.com/Blaizzy/mlx-examples/tree/pc/command-R

Only PAD ? Can you share the whole output?

Got it! @awni the cohere team added `model_max_length` set to 128K on both command-r models. Is there a way of setting using this number with the nn.Rope? Are there any...

You can move this issue here: https://github.com/Blaizzy/mlx-vllm I'm building a vllm specific package for MLX models.

@NanoCode012 Could you let me know what else are you looking for?