llama.cpp
llama.cpp copied to clipboard
[Feature] Is there a way to get response with embeddings as input?
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Background
Hi all, llama.cpp
provides a way to get the embeddings instead of text as response. However I didn't find an API to take embeddings as input and continue to generate text response. Is it possible to realize it with current APIs? If not, what about adding such an API?
References
Download the latest llama.cpp code and compile it with cmake option -DLLAMA_BUILD_SERVER=ON
.
Embeddings
First, run the server with --embedding
option:
server -m models/7B/ggml-model.bin --ctx_size 2048 --embedding
Run this code in NodeJS:
const axios = require('axios');
async function TestEmbeddings() {
let result = await axios.post("http://127.0.0.1:8080/embedding", {
content: `Hello`,
threads: 5
});
// print the embedding array
console.log(result.data.embedding);
}
TestEmbeddings();
@FSSRepo Thanks but what I want is to input the embedding and get text as response. For example an API like the code below:
const char* GetResponse(float* tokens);
If the current APIs are enough to do this, would you mind giving me an example?
I might make an unclear description in this issue and sorry for the confusion.
You refers to a convertion embedding to text
?.
You can generate a list of embeddings and compare them with your input text:
Open AI API
# a: vector embedding source
# b: vector embedding target
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
I don't know how to use embeddings, but this may give you some idea.
You refers to a convertion
embedding to text
?
Yes, sometimes it clould be useful. Here's an example:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html
In my thought, the text2text process is like below:
- tokenize
- get embeddings
- model inference
- get tokens
- detokenize
However in main.cpp, the 2, 3, 4 seem to be implemented by llama_eval
and llama_sample
. I don't know how I can get the response when I only have the embeddings. Is there anyway to achieve it?
This is the code is to perform a semantic text search:
const axios = require('axios');
let docs_chunks = [
{
text: "Microsoft is a multinational technology company founded by Bill Gates and Paul Allen in 1975. It is one of the world's largest and most influential companies in the software industry. Microsoft's primary focus is on developing, manufacturing, licensing, and supporting a wide range of software products, services, and devices.",
embedding: [],
similarity: 0
},
{
text: "Google is a multinational technology company that specializes in internet-related products and services. It was founded by Larry Page and Sergey Brin in 1998 while they were Ph.D. students at Stanford University. Google has since grown to become one of the world's most recognizable and influential companies.",
embedding: [],
similarity: 0
},
{
text: "Apple Inc. is a multinational technology company based in Cupertino, California. It was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976. Apple is renowned for its hardware products, software, and services, and it has become one of the world's most valuable and recognizable companies.",
embedding: [],
similarity: 0
},
{
text: "Samsung is a multinational conglomerate company based in South Korea. Founded in 1938 by Lee Byung-chul, Samsung has grown to become one of the world's largest and most influential technology companies. It operates in various sectors, including electronics, shipbuilding, construction, and more.",
embedding: [],
similarity: 0
}
]
async function getEmbedding(text) {
let result = await axios.post("http://127.0.0.1:8080/embedding", {
content: text,
threads: 6
});
return result.data.embedding;
}
async function processEmbeddingsDocs() {
for(let chunk of docs_chunks) {
chunk.embedding = await getEmbedding(chunk.text);
}
}
function cosine_similarity(a, b) {
if(a.length != b.length) {
return 0;
}
// dot product
let dot = 0;
for(let i = 0; i < a.length;i++) {
dot += a[i] * b[i];
}
// norm a
let normA_ = 0;
for(let i = 0; i < a.length;i++) {
normA_ += a[i] * a[i];
}
// norm b
let normB_ = 0;
for(let i = 0; i < b.length;i++) {
normB_ += b[i] * b[i];
}
let normA = Math.sqrt(normA_);
let normB = Math.sqrt(normB_);
return dot / (normA * normB);
}
async function main() {
// get embeddings of the docs chunks
await processEmbeddingsDocs();
// find in the docs
let input_embedding = await getEmbedding(`what is samsung?`);
for(let chunk of docs_chunks) {
chunk.similarity = cosine_similarity(chunk.embedding, input_embedding);
}
docs_chunks.sort((a,b) => { return b.similarity - a.similarity; });
console.log("Result: " + docs_chunks[0].text); // bad result
}
main();
This is giving me poor results.
Thank you very much for your help. However the example is not "embeddings to text". Could someone tell if it's possible to perform such a process?
I have the same question. In particular, I would like to use this as a way to probe the embedding space.
I think it is possible to add interface to input inpL
in https://github.com/ggerganov/llama.cpp/blob/ffb06a345e3a9e30d39aaa5b46a23201a74be6de/llama.cpp#L1255 directly.
@ningshanwutuobang I see that line in the file, but I don't know how I'd add an interface to input inpL
directly.
@prcamp @AsakusaRinne I have tried to add such interface in https://github.com/ningshanwutuobang/llama.cpp/blob/embd_inp/examples/embd_input/embd_input_test.cpp The process look like follow:
- create context.
- ~~quantize~~ use the float tensor to replace
inpL
(which need to change, i am not sure for the order) - eval the rest part of
llama_eval_internal
- sampling.
@FSSRepo Using the latest build, I cannot build the server, how to do it ? M2 pro
- I go to examples/server
- cmake -DLLAMA_BUILD_SERVER=ON
warning :
CMake Warning:
No source or binary directory provided. Both will be assumed to be the
same as the current working directory, but note that this warning will
become a fatal error in future CMake releases.
CMake Warning (dev) in CMakeLists.txt:
No project() command is present. The top-level CMakeLists.txt file must
contain a literal, direct call to the project() command. Add a line of
code such as
project(ProjectName)
near the top of the file, but after cmake_minimum_required().
CMake is pretending there is a "project(Project)" command on the first
line.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) in CMakeLists.txt:
cmake_minimum_required() should be called prior to this top-level project()
call. Please see the cmake-commands(7) manual for usage documentation of
both commands.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) in CMakeLists.txt:
No cmake_minimum_required command is present. A line of code such as
cmake_minimum_required(VERSION 3.26)
should be added at the top of the file. The version specified may be lower
if you wish to support older CMake versions for this project. For more
information run "cmake --help-policy CMP0000".
This warning is for project developers. Use -Wno-dev to suppress it.
and there's no server file created just CMakeFiles folder
Any tips ?
Thanks
@x4080
Go to the build folder:
llama.cpp/build
On the build folder:
cmake .. -DLLAMA_BUILD_SERVER=ON
Build it:
cmake --build . --config Release
@FSSRepo thanks I'll try it, not in front of my computer now
Edit : It works, thanks. For future references
mkdir build
cd build
cmake .. -DLLAMA_BUILD_SERVER=ON
cmake --build . --config Release
This issue was closed because it has been inactive for 14 days since being marked as stale.