cr-sqlite
cr-sqlite copied to clipboard
Detecting "dead" db versions
In our project, we have an optimization to reduce DB lookups that involves in-memory tracking what db_versions exists and which site_id they're associated with.
This lets us produce a message used for synchronizing missing changes with between nodes in the cluster. For example, a node will send a "sync request" to another node, telling it all the ranges of versions it is missing per site_id. Each node maps its internal db_version with a cluster-consistent version.
Currently, this in-memory map of "known versions" keeps growing and never shrinks. The way to shrink it would be to detect now-dead db_version caused by applying changes that completely overwrites them. For example: node A writes a record, node B fully updates the same record later on, making it so the old db_version produced by applying changes from node A are now absent from crsql_changes.
We have a way around this that involves periodically making a query to crsql_changes that counts groups of site_id, db_version and diffs with our in-memory map. This query is expensive, taking up a lot CPU time, the more rows there are the more time is spent there. I've already optimized the query to only look at one site_id at a time and only for db_versions that might have become absent. Still, it's way too much time spent in the query.
Ideally, we wouldn't need a periodic task and we'd be able to know which db_versions are now "invalidated" after inserting changes in crsql_changes. Calling, within a transaction, a function to determine that would be ideal! I don't like periodic tasks.
Can you share how you're working around this currently? That way I can add a test for it so I don't break you guys.
Yes, I get all "live" DB versions like this:
SELECT db_version FROM (
SELECT DISTINCT(__crsql_db_version) AS db_version FROM foo__crsql_clock
UNION
SELECT DISTINCT(__crsql_db_version) AS db_version FROM bar__crsql_clock
UNION
-- ...
)