[SAMPLE]How to use the model "intfloat/multilingual-e5-small" to complete the embedding operation

Open nnbw-liu opened this issue 8 months ago • 1 comments

I need to use the intfloat/multilingual-e5-small model. However, I encountered a problem with missing tags such as [UNK] and [SEP] when loading VOCab.txt on the ARM64 architecture. Upon researching, it was found that 'intfloat/multilingual-e5-small' uses XLMRobertaTokenizer (dependent on SentencePiece). I am in Microsoft I found SentencePieceTokenizers in ML.Tokenizers, and their usage is different from BertTokenizer's. I don't know how to use it. Can you provide me with a tutorial on how to use it. I went through the file The OpenRead method read the Stream and successfully loaded SentencePieceTokenizers, but I don't know how to use it in the future.

Apr 25 '25 08:04 nnbw-liu

Hi, we can look into adding this model, but we are currently focused on other samples and features. We will add this to our list of models to investigate later this year

Apr 25 '25 16:04 nmetulev