Prince Canuma

Results 151 comments of Prince Canuma

Done the 4bit model with updated tokenizer is available in th hub. Link: [mlx-community/c4ai-command-r-v01-4bit](https://huggingface.co/mlx-community/c4ai-command-r-v01-4bit)

Thanks! Indeed you have a point, and I see we all converging to a defacto template. However, what happens when it's a pretrained model without any instruction-tuning? Won't defaulting (fallback)...

Makes sense to me :) As a ML Engineer this is intuitive. I'm just thinking about UX for the users at large that might understand distinction. They could just run...

> Currently, the implementation only applies the chat template when the tokenizer has an explicitly specified chat template in the tokenizer configuration or implementation. Recently released models tend to use...

@mzbac Got it, It works with both pre-trained and instruction models. 👍🏽 But I noticed that with starcoder2-3b it gives inconsistent results. ```Python python -m mlx_lm.generate --model mlx-community/starcoder2-3b-4bit --prompt "Write...

Found the issue, without the condition to check `tokenizer.chat_template is not None`, the condition becomes `True` which trigger this warning: `No chat template is defined for this tokenizer - using...

> @Blaizzy @mzbac it sounds like there is still an issue here? Are you intending to send a fix? I was a unwell but I'm back. @mzbac's last commit ([df1eb23](https://github.com/ml-explore/mlx-examples/pull/577/commits/df1eb2304e8b09f734ba285a617736a1d44c2376))...

> The challenge with patching the chat_template is sometimes it's quite hard to get the original model patched since the mlx-lm is compatible with HF format models, which may introduce...

> Yeah, the mlx-lm can directly load HF models. So if the original model has not been patched and loaded via mlx-lm as expected using the default chat model, it...

TL,DR: We don't need this change, but if we decide to add it we better add: 1. A warning that `default_chat_template` might produce wrong results 2. A condition to check...