[SAMPLE]How to use the model "intfloat/multilingual-e5-small" to complete the embedding operation
I need to use the intfloat/multilingual-e5-small model. However, I encountered a problem with missing tags such as [UNK] and [SEP] when loading VOCab.txt on the ARM64 architecture. Upon researching, it was found that 'intfloat/multilingual-e5-small' uses XLMRobertaTokenizer (dependent on SentencePiece). I am in Microsoft I found SentencePieceTokenizers in ML.Tokenizers, and their usage is different from BertTokenizer's. I don't know how to use it. Can you provide me with a tutorial on how to use it. I went through the file The OpenRead method read the Stream and successfully loaded SentencePieceTokenizers, but I don't know how to use it in the future.
Hi, we can look into adding this model, but we are currently focused on other samples and features. We will add this to our list of models to investigate later this year