Awni Hannun
Awni Hannun
Awesome, I will try with the chat template! If you are able to upload the 4-bit version w/ the chat template to the MLX Community I think that would be...
- Fixed rope to traditional - Fixed an issue with layer norm upcasting to fp32 - Rebased on main + ran formatting
> Btw, could you explain what is the difference between rope traditional on and off? When should I use one vs the other? Also, what output did you get with...
> Out of curiosity and possibly showing my ignorance, why not use mx.fast.scaled_dot_product_attention for phi and phixtral as well? Yes it's really important for phi to upcast the queries (and...
Cool! Although I'm wondering how that will go to do encoder/decoder style models in MLX LM. We have a [T5 example](https://github.com/ml-explore/mlx-examples/tree/main/t5) you can use as a reference. If it doesn't...
We don't have such an operation, sorry! You could do something like: ```python def nansum(x): return mx.sum(mx.where(mx.isnan(x), 0, x)) ```
Just curious, what cases does this fix that currently do not work? Don't the instruction tuned models have the template in the tokenizer? This would fix the commandR issue for...
@Blaizzy @mzbac it sounds like there is still an issue here? Are you intending to send a fix?
🤔 so if I understand correctly, this PR will improve the default setting for some models (by using the default chat template) but for other models it might be worse...
> The HF tokenizer.apply_chat_template is showing a warning when applying the default template Nice, that's useful to know! > I am happy to close this PR for now if we...