Steward Garcia

Results 92 comments of Steward Garcia

Requantize your model to the lastest version, and update use the latest server example release

Download the latest llama.cpp code and compile it with cmake option `-DLLAMA_BUILD_SERVER=ON`. ### Embeddings First, run the server with `--embedding` option: ```bash server -m models/7B/ggml-model.bin --ctx_size 2048 --embedding ``` Run...

You refers to a convertion `embedding to text`?. You can generate a list of embeddings and compare them with your input text: Open AI API ```python # a: vector embedding...

This is the code is to perform a semantic text search: ```javascript const axios = require('axios'); let docs_chunks = [ { text: "Microsoft is a multinational technology company founded by...

@x4080 Go to the build folder: ```bash llama.cpp/build ``` On the build folder: ```bash cmake .. -DLLAMA_BUILD_SERVER=ON ``` Build it: ```bash cmake --build . --config Release ```

It seems that `llama_free` is not releasing the memory used by the previously used weights.

`--steering-source` and `--steering-layer`, are the parameters random or is there a way to know which is which? Trial and error?

@ggerganov > Batched decoding endpoint? This option to generate multiple alternatives for the same prompt requires the ability to change the seed, and the truth is, I've been having a...

In my opinion, most of these projects based on ggml have the characteristic of being very lightweight with few dependencies (headers library: httplib.h json.hpp stb_image.h and others), making them portable...

I would suggest something like creating a small utility that performs the functionality we are interested in using C++ (porting it). Analyzing the Jinja2cpp library quickly, it has Boost as...