llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

How do I get input embeddings?

Open StrikingLoo opened this issue 1 year ago • 8 comments

I am trying to output just the sentence embedding for a given input, instead of any new generated text. I think this should be rather straightforward but figured someone more familiar with the codebase could help me.

I just want to return the sentence embedding vector and stop execution for a given input.

I am almost sure the place where I want to make the embedding is right after norm but before lm_head, and I think they will be in inpL if I run

ggml_build_forward_expand(&gf, inpL);
ggml_graph_compute       (ctx0, &gf);

However I am confused by the struct and not sure how to get the sentence embedding itself. I understand it should be some index of ggml_get_data(inpL), but don't get which index, and that is why I come to you. Would anyone lend me a hand?

StrikingLoo avatar Mar 17 '23 04:03 StrikingLoo

I believe you can get the embedding using llama_tokenize which only requires the gpt_vocab object and the text to tokenize.

j-f1 avatar Mar 17 '23 12:03 j-f1

I believe you can get the embedding using llama_tokenize which only requires the gpt_vocab object and the text to tokenize.

Those wouldn't be embeddings, those would just be tokenized values. The embeddings are obtained in the call to get_rows inside llama_eval. That's where you fetch the row from tok_embeddings corresponding to the indices (i.e. token ids) you get from tokenize.

setzer22 avatar Mar 17 '23 14:03 setzer22

Those are the token embeddings for the window (line 582 if it didn't change from my last pull). I wanted the sentence representation, which would be the embedding of the last word after doing one feedforward through the transformer. Again I think they would be somewhere after line 722 before line 724 in main.cpp, I just don't understand how to access them. I would just need the minimal code that prints the values for the embedding of the last token in that output.

StrikingLoo avatar Mar 17 '23 16:03 StrikingLoo

Is this what you're thinking?

        /// (loop handling each layer) ///

        // input for next layer
        inpL = cur;
        
    }

    // norm
    {
        inpL = ggml_rms_norm(ctx0, inpL);

        // inpL = norm*inpL
        inpL = ggml_mul(ctx0,
                        ggml_repeat(ctx0, model.norm, inpL),
                        inpL);
        
        std::vector<float> embedding_representation;    
        embedding_representation.resize(n_embd);
        memcpy(embedding_representation.data(), (float *) ggml_get_data(inpL) + (n_embd * (N - 1)), sizeof(float) * n_embd);
    }
    
    /// (forward expand) ///

MillionthOdin16 avatar Mar 17 '23 19:03 MillionthOdin16

That is exactly what I wanted to do! I was going crazy trying to do the indexing right. I will test this tonight. Thank you!

If interested, I can do a PR adding this functionality under a console flag (I'm thinking --embedding or --sentence-representation, but please feel free to suggest another). I saw other open issues requesting the same and the repo gets enough attention that more people are bound to want it.

Thank you!

On Fri, Mar 17, 2023, 12:12 MillionthOdin16 @.***> wrote:

Is this what you're thinking?

    /// (loop handling each layer) ///

    // input for next layer
    inpL = cur;

}

// norm
{
    inpL = ggml_rms_norm(ctx0, inpL);

    // inpL = norm*inpL
    inpL = ggml_mul(ctx0,
                    ggml_repeat(ctx0, model.norm, inpL),
                    inpL);

    std::vector<float> embedding_representation;
    embedding_representation.resize(n_embd);
    memcpy(embedding_representation.data(), (float *) ggml_get_data(inpL) + (n_embd * (N - 1)), sizeof(float) * n_embd);
}

/// (forward expand) ///

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/224#issuecomment-1474292433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6JCX3GZD6HRUCL7QKRMZDW4SZR3ANCNFSM6AAAAAAV6CDIWM . You are receiving this because you authored the thread.Message ID: @.***>

StrikingLoo avatar Mar 17 '23 20:03 StrikingLoo

I know I want it. It should probably be folded into the in-progress API implementation too at #77

Ronsor avatar Mar 18 '23 00:03 Ronsor

Would be awesome to have the option to get the embeddings as well as an option. Did you end up doing a PR for it? @StrikingLoo

adriacabeza avatar Mar 19 '23 20:03 adriacabeza

Hi both of you. I did! It's here. https://github.com/ggerganov/llama.cpp/pull/282 But it's not working yet. I am not sure what is failing. I added the console flag as an argument, and everything works fine outside of the embedding mode, but on embedding mode I get an execution error. Without the flag it was 'working' but I an't make this one work. If anyone can take a look at it and tell me what I'm missing that would be great

StrikingLoo avatar Mar 19 '23 23:03 StrikingLoo

@StrikingLoo by any chance did you explore evaluation of these embeddings? I was looking at the codebase out there https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/SentenceEvaluator.py and was thinking on how it could be integrated

louis030195 avatar Apr 09 '23 15:04 louis030195