ModelCache
ModelCache copied to clipboard
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding,...
https://github.com/codefuse-ai/CodeFuse-ModelCache/blob/00fac9a61ac57ef90ad44c51e8e495e17dc893f3/modelcache/embedding/data2vec.py#L17C2-L22C1 param model not used and replaced in code.
Is the project still being maintained, or are there any new plans for updates?
**Abstract:** 1. Implemented soft delete and hard delete in MySQL. 2. Implemented a cache eviction strategy using MySQL and Mivuls. **Problems Solved:** 1. Multiple methods were not implemented, causing issues...
I am looking for a mean to use ModelChat in FastChat to speed up the LLM processes. Any pointer?
This issue is created to better track my PRs for Todo List [Rank ability] ## Background Efficiently retrieving relevant results from large-scale datasets plays a crucial role in software development...
Hi, I've discovered a critical vulnerability in the MapDataManager class where pickle.load is used to deserialize cached data from a file. The use of pickle is inherently unsafe as it...
请问开发者,ModelCache可以直接提供标准的OpenAI格式的查询接口吗,比如原先程序是直接调用在线LLM的,直接替换接口链接和模型name,实现无缝接入到缓存当中。 因为对于一些无法编辑改造查询方式的程序来说,可以直接通过替换OpenAI格式的模型接口,就可以实现直接接入ModelCache。因为缓存期待的是快速响应嘛,所以在没有命中缓存的情况下,调用在线LLM接口查询答案,并且以流式的形式返回从LLM拿到的答案,兼容这些特征。 还有一个疑惑:如果历史上下文、prompt较长的情况下,是否会影响整体召回的准确度,有没有考虑将用户消息、prompt、历史上下文分别存储、计算向量呢。还是说我们查询ModelCache的时候应该尽量精简篇幅,只保留用户消息。希望获得解答。谢谢