FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

Are any of your pretrained models available for commercial use?

Open sidkgp opened this issue 6 months ago • 0 comments

Context: https://github.com/embeddings-benchmark/mteb/issues/2868

Most of the models in https://www.sbert.net/docs/sentence_transformer/pretrained_models.html appear to be trained on MS Marco. My understanding is that any model that uses that dataset is not able to be used commercially. So, I am confused why for example https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 is listed as Apache v2.0, when its training data includes MS Marco.

From reading qwen3 paper (Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models), I was hopeful because you mention their training data is synthetic and they reference Apache v2 models in their abstract. However, table 6 lists MS Marco as one of their training dataset.

In any case, do you know of pretrained models from anyone else that can be used commercially?

sidkgp avatar Jun 29 '25 14:06 sidkgp