Using Sentence Transformer Models such as LaBSE or Huggingface MuRIL
How can I use these embedding models in Topic2Vec for Pretrained Embedding. As this library support sentence-transformers, how can I use them ?
Also, how can I use other huggingface models for embedding generation. If I download a pretrained model and write a callable function for generating embedding, can I use it as an embedding module ?
Hi @reichenbch
I have LaBSE working, although I'm not sure if its the most efficient method. I append LaBSE label to sbert_models (in top2vec.py) and then:
from top2vec import Top2Vec model = Top2Vec(documents, embedding_model='LaBSE', use_embedding_model_tokenizer=True)
I have not tried MuRIL. Perhaps append MuRIL label to 'use_models' + equivalent at 'use-urls'? And this looks promising:
from sentence_transformers import SentenceTransformer SentenceTransformer('google/muril-base-cased')
Thus, I'm guessing something like this may also work:
from top2vec import Top2Vec model = Top2Vec(documents, embedding_model='google/muril-base-case', use_embedding_model_tokenizer=True)
Top2Vec allows embedding_model to be a string or callable. So currently if your model of choice is not in the string options you can just pass it as a callable.