sqlite-utils
sqlite-utils copied to clipboard
Optional caching mechanism for table.lookup()
Inspired by work on git-history
where I used this pattern:
column_name_to_id = {}
def column_id(column):
if column not in column_name_to_id:
id = db["columns"].lookup(
{"namespace": namespace_id, "name": column},
foreign_keys=(("namespace", "namespaces", "id"),),
)
column_name_to_id[column] = id
return column_name_to_id[column]
If you're going to be doing a large number of table.lookup(...)
calls and you know that no other script will be modifying the database at the same time you can presumably get a big speedup using a Python in-memory cache - maybe even a LRU one to avoid memory bloat.
API could be this:
id = db["columns"].lookup(
{"namespace": namespace_id, "name": column},
cache=True
)
This could default to a 100 item LRU cache. You could perhaps modify that with cache_size=500
or with cache_size=None
to disable the size limit on that cache.
Should I implement this remember to apply the optimization in git-history
.
Would be interesting to micro-benchmark this to get an idea for how much of a performance boost it is, since the indexed SQLite lookups used by table.lookup()
should be really fast already.