lightseq
lightseq copied to clipboard
use lightseq to export XLMRobertaModel,the precision of the output embeddings loss a lot
I use huggingface sentence-transformers/paraphrase-multilingual-mpnet-base-v2(https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2), this is a xlm-roberta model. This model is used to calculate sentence embedding similarity. Then I use lightseq huggingface hf_bert_export.py to export. the exported model and the origin model generate different embeddings of the same sentence.
sentences : "Hello, my dog is cute", "Hey, how are you", "This is a test", "Testing the model again",
and the embedding is below

这个是产出的句子embedding向量,精度损失还是挺大的,导出后没法做句子相似度计算和文本检索了
This may not be a loss of accuracy. Maybe you need to check if your model structure is the same as bert. Also, check some options like pre-ln or post-ln during exporting your model.