llama.cpp
llama.cpp copied to clipboard
How do I get input embeddings?
I am trying to output just the sentence embedding for a given input, instead of any new generated text. I think this should be rather straightforward but figured someone more familiar with the codebase could help me.
I just want to return the sentence embedding vector and stop execution for a given input.
I am almost sure the place where I want to make the embedding is right after norm
but before lm_head
, and I think they will be in inpL
if I run
ggml_build_forward_expand(&gf, inpL);
ggml_graph_compute (ctx0, &gf);
However I am confused by the struct and not sure how to get the sentence embedding itself. I understand it should be some index of ggml_get_data(inpL), but don't get which index, and that is why I come to you. Would anyone lend me a hand?
I believe you can get the embedding using llama_tokenize
which only requires the gpt_vocab
object and the text to tokenize.
I believe you can get the embedding using llama_tokenize which only requires the gpt_vocab object and the text to tokenize.
Those wouldn't be embeddings, those would just be tokenized values. The embeddings are obtained in the call to get_rows
inside llama_eval
. That's where you fetch the row from tok_embeddings
corresponding to the indices (i.e. token ids) you get from tokenize.
Those are the token embeddings for the window (line 582 if it didn't change from my last pull). I wanted the sentence representation, which would be the embedding of the last word after doing one feedforward through the transformer. Again I think they would be somewhere after line 722 before line 724 in main.cpp, I just don't understand how to access them. I would just need the minimal code that prints the values for the embedding of the last token in that output.
Is this what you're thinking?
/// (loop handling each layer) ///
// input for next layer
inpL = cur;
}
// norm
{
inpL = ggml_rms_norm(ctx0, inpL);
// inpL = norm*inpL
inpL = ggml_mul(ctx0,
ggml_repeat(ctx0, model.norm, inpL),
inpL);
std::vector<float> embedding_representation;
embedding_representation.resize(n_embd);
memcpy(embedding_representation.data(), (float *) ggml_get_data(inpL) + (n_embd * (N - 1)), sizeof(float) * n_embd);
}
/// (forward expand) ///
That is exactly what I wanted to do! I was going crazy trying to do the indexing right. I will test this tonight. Thank you!
If interested, I can do a PR adding this functionality under a console flag (I'm thinking --embedding or --sentence-representation, but please feel free to suggest another). I saw other open issues requesting the same and the repo gets enough attention that more people are bound to want it.
Thank you!
On Fri, Mar 17, 2023, 12:12 MillionthOdin16 @.***> wrote:
Is this what you're thinking?
/// (loop handling each layer) /// // input for next layer inpL = cur; } // norm { inpL = ggml_rms_norm(ctx0, inpL); // inpL = norm*inpL inpL = ggml_mul(ctx0, ggml_repeat(ctx0, model.norm, inpL), inpL); std::vector<float> embedding_representation; embedding_representation.resize(n_embd); memcpy(embedding_representation.data(), (float *) ggml_get_data(inpL) + (n_embd * (N - 1)), sizeof(float) * n_embd); } /// (forward expand) ///
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/224#issuecomment-1474292433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6JCX3GZD6HRUCL7QKRMZDW4SZR3ANCNFSM6AAAAAAV6CDIWM . You are receiving this because you authored the thread.Message ID: @.***>
I know I want it. It should probably be folded into the in-progress API implementation too at #77
Would be awesome to have the option to get the embeddings as well as an option. Did you end up doing a PR for it? @StrikingLoo
Hi both of you. I did! It's here. https://github.com/ggerganov/llama.cpp/pull/282 But it's not working yet. I am not sure what is failing. I added the console flag as an argument, and everything works fine outside of the embedding mode, but on embedding mode I get an execution error. Without the flag it was 'working' but I an't make this one work. If anyone can take a look at it and tell me what I'm missing that would be great
@StrikingLoo by any chance did you explore evaluation of these embeddings? I was looking at the codebase out there https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/SentenceEvaluator.py and was thinking on how it could be integrated