llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

XLMRoberta support

Open Oliver-Y opened this issue 1 year ago • 1 comments

Added support for XLMRoberta model. Tested on Multilingual E5 embeddings model. It seems in the tokenizer.json of E5 a preprocessor is used but since llama.cpp doesn't support SPM preprocessors yet I put a simple workaround right before the SPM tokenizer call.

This is my first time contributing so would love feedback of any form!

  • [x] I have read the contributing guidelines
  • Self-reported review complexity: Low-Medium
    • [x] Low
    • [X] Medium
    • [ ] High

Oliver-Y avatar Jul 23 '24 00:07 Oliver-Y

Might be a bug w/ tokenization. Going to take a look first

Oliver-Y avatar Jul 23 '24 17:07 Oliver-Y

Redundant to #8658 so closing

Oliver-Y avatar Jul 24 '24 07:07 Oliver-Y