StableLM icon indicating copy to clipboard operation
StableLM copied to clipboard

Embeddings with StableLM?

Open enjalot opened this issue 2 years ago • 5 comments

Is it possible to get embeddings from the model for my input text?

I.e. could I replace GTP3 calls from OpenAI with some python code and this model?

enjalot avatar Apr 19 '23 19:04 enjalot

I would recommend taking a look at https://www.sbert.net/ . To my best knowledge the OpenAI models are not outstanding at all for embeddings (https://huggingface.co/spaces/mteb/leaderboard) but it is convenience to use the API of them - at least for us.

sirwalt avatar Apr 19 '23 23:04 sirwalt

If it helps, I have successfully used: sentence-transformers/all-mpnet-base-v2 as an alternative to the OpenAI text-embedding-ada-002

lingster avatar Apr 23 '23 19:04 lingster

Hello, i am able to extract the embeddings from the model. from transformers import AutoModelForCausalLM, AutoConfig,AutoTokenizer

checkpoint = "path/to/the/model"

config = AutoConfig.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_config(config) tokenizer = AutoTokenizer.from_pretrained(checkpoint)

inputs = tokenizer.encode('Stability AI democratised AI by open sourcing large models', return_tensors="pt") outputs = model(inputs) hidden_states = output[1]

now hidden states has output of all the layers. You can use the output of last layer.

Since i am a newbie to huggingface, there might be better ways to do this. Please share if you find something better.

sandyflute avatar Apr 28 '23 05:04 sandyflute

These models are multi-lingual?

wajihullahbaig avatar Jun 02 '23 14:06 wajihullahbaig

If you are looking for another convenient API might consider embaas. They offer a similiar structure to openai and you can use the MTEB leaderboard top members. They have some mutlilingual models as well and integrate wiht langchain or have an easy to use python client

juliuslipp avatar Aug 10 '23 18:08 juliuslipp