sqlite-utils icon indicating copy to clipboard operation
sqlite-utils copied to clipboard

Optional caching mechanism for table.lookup()

Open simonw opened this issue 3 years ago • 3 comments

Inspired by work on git-history where I used this pattern:

    column_name_to_id = {}

    def column_id(column):
        if column not in column_name_to_id:
            id = db["columns"].lookup(
                {"namespace": namespace_id, "name": column},
                foreign_keys=(("namespace", "namespaces", "id"),),
            )
            column_name_to_id[column] = id
        return column_name_to_id[column]

If you're going to be doing a large number of table.lookup(...) calls and you know that no other script will be modifying the database at the same time you can presumably get a big speedup using a Python in-memory cache - maybe even a LRU one to avoid memory bloat.

simonw avatar Dec 06 '21 17:12 simonw

API could be this:

id = db["columns"].lookup(
    {"namespace": namespace_id, "name": column},
    cache=True
)

This could default to a 100 item LRU cache. You could perhaps modify that with cache_size=500 or with cache_size=None to disable the size limit on that cache.

simonw avatar Dec 06 '21 17:12 simonw

Should I implement this remember to apply the optimization in git-history.

simonw avatar Dec 06 '21 17:12 simonw

Would be interesting to micro-benchmark this to get an idea for how much of a performance boost it is, since the indexed SQLite lookups used by table.lookup() should be really fast already.

simonw avatar Dec 06 '21 17:12 simonw