llama.cpp
llama.cpp copied to clipboard
[Feature Request] Add batch processing for input prompt data in embedding mode.
It would be nice to add batch processing for input prompt data in embedding mode.
I.e., read prompts from a file and output a map of prompts and their embeddings.
It seems that this can be done by modifying the logic of the -f flag for embedding mode.
I second this. I've been working on integrating llama.cpp in langchain, but the retrieval of embeddings is terribly slow since we can only pass single strings (for which the model is loaded anew every time). Batch processing embeddings would be very helpful here, preferably by being able to pass a list of strings in the CLI.
This issue was closed because it has been inactive for 14 days since being marked as stale.