Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Background

Hi all, llama.cpp provides a way to get the embeddings instead of text as response. However I didn't find an API to take embeddings as input and continue to generate text response. Is it possible to realize it with current APIs? If not, what about adding such an API?

References

The application of such an API

May 21 '23 13:05 AsakusaRinne

Download the latest llama.cpp code and compile it with cmake option -DLLAMA_BUILD_SERVER=ON.

Embeddings

First, run the server with --embedding option:

server -m models/7B/ggml-model.bin --ctx_size 2048 --embedding

Run this code in NodeJS:

const axios = require('axios');

async function TestEmbeddings() {
    let result = await axios.post("http://127.0.0.1:8080/embedding", {
        content: `Hello`,
        threads: 5
    });
    // print the embedding array
    console.log(result.data.embedding);
}

TestEmbeddings();

May 22 '23 19:05 FSSRepo

@FSSRepo Thanks but what I want is to input the embedding and get text as response. For example an API like the code below:

const char* GetResponse(float* tokens);

If the current APIs are enough to do this, would you mind giving me an example?

I might make an unclear description in this issue and sorry for the confusion.

May 22 '23 20:05 AsakusaRinne

You refers to a convertion embedding to text?.

You can generate a list of embeddings and compare them with your input text:

Open AI API

# a: vector embedding source
# b: vector embedding target
def cosine_similarity(a, b): 
     return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

I don't know how to use embeddings, but this may give you some idea.

May 22 '23 20:05 FSSRepo

You refers to a convertion embedding to text?

Yes, sometimes it clould be useful. Here's an example:

https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html

In my thought, the text2text process is like below:

tokenize
get embeddings
model inference
get tokens
detokenize

However in main.cpp, the 2, 3, 4 seem to be implemented by llama_eval and llama_sample. I don't know how I can get the response when I only have the embeddings. Is there anyway to achieve it?

May 22 '23 20:05 AsakusaRinne

This is the code is to perform a semantic text search:

const axios = require('axios');

let docs_chunks = [
    {
        text: "Microsoft is a multinational technology company founded by Bill Gates and Paul Allen in 1975. It is one of the world's largest and most influential companies in the software industry. Microsoft's primary focus is on developing, manufacturing, licensing, and supporting a wide range of software products, services, and devices.",
        embedding: [],
        similarity: 0
    },
    {
        text: "Google is a multinational technology company that specializes in internet-related products and services. It was founded by Larry Page and Sergey Brin in 1998 while they were Ph.D. students at Stanford University. Google has since grown to become one of the world's most recognizable and influential companies.",
        embedding: [],
        similarity: 0
    },
    {
        text: "Apple Inc. is a multinational technology company based in Cupertino, California. It was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976. Apple is renowned for its hardware products, software, and services, and it has become one of the world's most valuable and recognizable companies.",
        embedding: [],
        similarity: 0
    },
    {
        text: "Samsung is a multinational conglomerate company based in South Korea. Founded in 1938 by Lee Byung-chul, Samsung has grown to become one of the world's largest and most influential technology companies. It operates in various sectors, including electronics, shipbuilding, construction, and more.",
        embedding: [],
        similarity: 0
    }
]

async function getEmbedding(text) {
    let result = await axios.post("http://127.0.0.1:8080/embedding", {
        content: text,
        threads: 6
    });
    return result.data.embedding;
}

async function processEmbeddingsDocs() {
    for(let chunk of docs_chunks) {
        chunk.embedding = await getEmbedding(chunk.text);
    }
}

function cosine_similarity(a, b) {
    if(a.length != b.length) {
        return 0;
    }
    // dot product
    let dot = 0;
    for(let i = 0; i < a.length;i++) {
        dot += a[i] * b[i];
    }
    // norm a
    let normA_ = 0;
    for(let i = 0; i < a.length;i++) {
        normA_ += a[i] * a[i];
    }
    
    // norm b
    let normB_ = 0;
    for(let i = 0; i < b.length;i++) {
        normB_ += b[i] * b[i];
    }
    let normA = Math.sqrt(normA_);
    let normB = Math.sqrt(normB_);
    return dot / (normA * normB);

}

async function main() {
    // get embeddings of the docs chunks
    await processEmbeddingsDocs();

    // find in the docs
    let input_embedding = await getEmbedding(`what is samsung?`);
    for(let chunk of docs_chunks) {
        chunk.similarity = cosine_similarity(chunk.embedding, input_embedding);
    }
    docs_chunks.sort((a,b) => { return b.similarity - a.similarity; });

    console.log("Result: " + docs_chunks[0].text); // bad result
}

main();

This is giving me poor results.

May 22 '23 22:05 FSSRepo

Thank you very much for your help. However the example is not "embeddings to text". Could someone tell if it's possible to perform such a process?

May 23 '23 04:05 AsakusaRinne

I have the same question. In particular, I would like to use this as a way to probe the embedding space.

May 23 '23 17:05 prcamp

I think it is possible to add interface to input inpL in https://github.com/ggerganov/llama.cpp/blob/ffb06a345e3a9e30d39aaa5b46a23201a74be6de/llama.cpp#L1255 directly.

Jun 01 '23 15:06 ningshanwutuobang

@ningshanwutuobang I see that line in the file, but I don't know how I'd add an interface to input inpL directly.

Jun 02 '23 18:06 prcamp

@prcamp @AsakusaRinne I have tried to add such interface in https://github.com/ningshanwutuobang/llama.cpp/blob/embd_inp/examples/embd_input/embd_input_test.cpp The process look like follow:

create context.
~~quantize~~ use the float tensor to replace inpL (which need to change, i am not sure for the order)
eval the rest part of llama_eval_internal
sampling.

Jun 03 '23 11:06 ningshanwutuobang

@FSSRepo Using the latest build, I cannot build the server, how to do it ? M2 pro

I go to examples/server
cmake -DLLAMA_BUILD_SERVER=ON

warning :

CMake Warning:
  No source or binary directory provided.  Both will be assumed to be the
  same as the current working directory, but note that this warning will
  become a fatal error in future CMake releases.


CMake Warning (dev) in CMakeLists.txt:
  No project() command is present.  The top-level CMakeLists.txt file must
  contain a literal, direct call to the project() command.  Add a line of
  code such as

    project(ProjectName)

  near the top of the file, but after cmake_minimum_required().

  CMake is pretending there is a "project(Project)" command on the first
  line.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) in CMakeLists.txt:
  cmake_minimum_required() should be called prior to this top-level project()
  call.  Please see the cmake-commands(7) manual for usage documentation of
  both commands.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) in CMakeLists.txt:
  No cmake_minimum_required command is present.  A line of code such as

    cmake_minimum_required(VERSION 3.26)

  should be added at the top of the file.  The version specified may be lower
  if you wish to support older CMake versions for this project.  For more
  information run "cmake --help-policy CMP0000".
This warning is for project developers.  Use -Wno-dev to suppress it.

and there's no server file created just CMakeFiles folder

Any tips ?

Thanks

Jun 06 '23 01:06 x4080

@x4080

Go to the build folder:

llama.cpp/build

On the build folder:

cmake .. -DLLAMA_BUILD_SERVER=ON

Build it:

cmake --build . --config Release

Jun 06 '23 04:06 FSSRepo

@FSSRepo thanks I'll try it, not in front of my computer now

Edit : It works, thanks. For future references

mkdir build
cd build
cmake .. -DLLAMA_BUILD_SERVER=ON
cmake --build . --config Release

Jun 06 '23 06:06 x4080

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 09 '24 01:04 github-actions[bot]

llama.cpp
llama.cpp copied to clipboard

[Feature] Is there a way to get response with embeddings as input?

Prerequisites

Background

References

Embeddings

llama.cpp llama.cpp copied to clipboard

[Feature] Is there a way to get response with embeddings as input?

Prerequisites

Background

References

Embeddings

llama.cpp
llama.cpp copied to clipboard