Arthur
Arthur
No worries, ping me whenever for another review!
IMO that is exactly the purpose of this pipeline. The functions should not necessarily have been part of the tokenizer as they are only need for the FIM task. So...
You should be able to use 4bit quantization!
Sounds good. No need for the generation config update . Tokens are string so should be saved in the tokenizer_config.json IMO
Yes, you can probably open a PR to the models and use the `revision`! WDYT?
I think beam search with ROPE and fp16 has instabilities yes, reported here: #26332 if I am not mistaken this is what we have no? And I think a recent...
I think computing ROPE in float32 percision should partly fix this
I'll mark this as closed, because llama now computes rope in float32! 🥳 Feel free to ping me if you feel like this should not be closed
Hey feel free to ping me when this is ready! 🤗
Ok! Thanks, I'll review now, but will let @amyeroberts handle the rest as I'll be off for a week 😉