Fix cache eviction and soft deleted using MySQL and Mivuls
Abstract:
- Implemented soft delete and hard delete in MySQL.
- Implemented a cache eviction strategy using MySQL and Mivuls.
Problems Solved:
- Multiple methods were not implemented, causing issues when applying the cache eviction strategy.
- The
mark_deletedmethod did not perform a soft delete but instead directly deleted the data bymark_id. - The
cache_codegpt_answertable in the database did not have anis_deletedfield, making soft deletes impossible. - The table creation statement you provided was only applicable to SQLite for
modelcache_llm_answer, while MySQL used thecache_codegpt_answertable. - The
self._vector_base.deletemethod did not specify a model, making it impossible to delete the corresponding table during cache eviction.
Modifications:
-
Implemented true soft delete:
- Renamed and added fields to the table in the database.
- In
modelcache\manager\scalar_data\sql_storage.py:- Implemented the
mark_deletedmethod to mark theis_deletedfield as 1 (pending deletion). - Implemented the
clear_deleted_datamethod. - Implemented the
countmethod.
- Implemented the
-
Database modifications:
- (1) Renamed the table in
reference_doc\create_table.sqlfrommodelcache_llm_answertocache_codegpt_answer, or alternatively modified thetable_namein the code. - (2) Added the
is_deletedfield tocache_codegpt_answer, with -1 for pending deletion and 0 for not deleted (consistent with GPTCache).
- (1) Renamed the table in
-
Implemented the cache eviction strategy, defaulting to LRU:
- In
modelcache\manager\eviction_manager.py, added amodelparameter to thedeletemethod to enable deletion of corresponding IDs in Mivuls.
- In
Considering that I only implemented the method for MySQL, I did not directly apply the cache eviction strategy in data_manager.py. To use the cache eviction strategy (MySQL + Mivuls), you need to add the following in modelcache\manager\data_manager.py:
class SSDataManager(DataManager):
def __init__(
self,
s: CacheStorage,
v: VectorBase,
o: Optional[ObjectBase],
e: Optional[EvictionBase],
max_size,
clean_size,
policy="LRU",
):
self.max_size = max_size
self.clean_size = clean_size
self.s = s
self.v = v
self.o = o
self.eviction_manager = EvictionManager(self.s, self.v)
if e is None:
e = EvictionBase(name="memory",
maxsize=max_size,
clean_size=clean_size,
policy=policy,
on_evict=self._clear)
self.eviction_base = e
self.model = None
def _clear(self, marked_keys):
self.eviction_manager.soft_evict(marked_keys)
# Soft delete
if self.eviction_manager.check_evict():
self.eviction_manager.delete(self.model)
def save(self, system_sentence, sys_embedding_data, question, answer, embedding_data, **kwargs):
self.model = kwargs.pop("model", None)
self.import_data([system_sentence], [sys_embedding_data], [question], [answer], [embedding_data], self.model)
The definition of self.model is to inform the data manager which model's table is being processed during insertion. The code refers to gptcache and tries to be functionally consistent.
This method has been locally verified to be feasible, with both the eviction strategy and soft delete functioning properly.
Please feel free to contact me if there are any issues with my changes.