cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

sqlite cache

Open thechief389 opened this issue 6 years ago • 10 comments

Is it possible to cache responses into a sqlite database?

thechief389 avatar May 09 '18 19:05 thechief389

write your own, and propose a pull req,

I do have a sqlite cache that adapt CacheControl but it require some refactor to proper propose as pull req.

AndCycle avatar May 14 '18 10:05 AndCycle

here is a minimal base that should work

class SQLiteCache(BaseCache):
    def __init__(self, db_path):

        self.conn = sqlite3.connect(db_path)

        db_init_script = f'''
            CREATE TABLE IF NOT EXISTS cache (key TEXT PRIMARY KEY, value BLOB);
        '''


        # sqlite use auto-index to enforce PRIMARY KEY/UNIQUE constrain, 
        # so it's not necessary to create additional index

        self.conn.executescript(db_init_script)

    def _execute(self, sql, parameters):
        with self.conn:
            self.conn.execute(sql, parameters)

    def _execute_read(self, sql, parameters):
        with self.conn:
            for val in self.conn.execute(sql, parameters):
                yield val

    def _execute_fetch_single_column(self, sql, parameters = {}):
        for val in self._execute_read(sql, parameters):
            yield val[0]

    def _execute_fetch_single_element(self, sql, parameters = {}):
        for val in self._execute_fetch_single_column(sql, parameters):
            return val

    def keys(self):
        for val in self._execute_fetch_single_column('SELECT key FROM cache'):
            yield val

    def get(self, key):
        return self._execute_fetch_single_element('SELECT value FROM cache WHERE key = :key', {'key': key})

    def set(self, key, val):
        self._execute('INSERT OR REPLACE INTO cache(key, value) VALUES (:key, :val)', {'key': key, 'val': val})

    def delete(self, key):
        self._execute('DELETE FROM cache WHERE key = :key', {'key': key})

AndCycle avatar May 14 '18 12:05 AndCycle

Thank you for showing me this.

thechief389 avatar May 15 '18 16:05 thechief389

Also, I won't be adding any other cache implementations to CacheConrol proper. I'd much rather you create your own package and submit a PR to the docs linking to it.

ionrock avatar May 15 '18 16:05 ionrock

I original write SQLite cache because there is too much problem with FileCache, primarily due to it's file locking implement have tons of issue if used on win32,

I think provide a simple SQLite implement will be very sufficient for most people, as I use SQLite implement for around 9 millions of files cached which occupy 1.3TB on my disk, very reliable as sqlite3 is a built-in package.

the down side is there is no easy implement to take care outdated files and vacuum for sqlite3, but that problem also present in current FileCache implement, which also require some manual work, so I don't think things can get any worse in this

AndCycle avatar May 15 '18 17:05 AndCycle

There's requests-cache, which supports SQLite caching, but doesn't have many features like CacheControl

thechief389 avatar May 15 '18 18:05 thechief389

The cache doesn't appear in the database when I used your base.

thechief389 avatar May 17 '18 16:05 thechief389

@thechief389 comment edited, as it's a strip down version of my original complex code with multi-consumer and have mem cache layer I did't verify it well, but you get the idea.

AndCycle avatar May 17 '18 16:05 AndCycle

It might be worth doing some research into adapting https://github.com/grantjenks/python-diskcache to cachecontrol. That package provides a generic cache implementation which is a mix of SQLite and file based caching. It would also be an interesting solution for issue #185

jaap3 avatar Jun 04 '18 07:06 jaap3

I did some testing, it seems that python-diskcache can be used as a drop in replacement for FileCache:

import requests

from cachecontrol import CacheControl
from diskcache import FanoutCache

class MyFanoutCache(FanoutCache):
    # Workaround until either grantjenks/python-diskcache#77 or #195 is fixed
    def __bool__(self): 
        return True
    __nonzero__ = __bool__

cache = MyFanoutCache('./tmp', size_limit=2 ** 30, eviction_policy='least-recently-used')
session = CacheControl(requests.Session(), cache=cache)

Then you could periodically call cache.cull() to remove expired items and get the size back down.

jaap3 avatar Jun 19 '18 09:06 jaap3