cr-sqlite icon indicating copy to clipboard operation
cr-sqlite copied to clipboard

Detecting "dead" db versions

Open jeromegn opened this issue 2 years ago • 2 comments
trafficstars

In our project, we have an optimization to reduce DB lookups that involves in-memory tracking what db_versions exists and which site_id they're associated with.

This lets us produce a message used for synchronizing missing changes with between nodes in the cluster. For example, a node will send a "sync request" to another node, telling it all the ranges of versions it is missing per site_id. Each node maps its internal db_version with a cluster-consistent version.

Currently, this in-memory map of "known versions" keeps growing and never shrinks. The way to shrink it would be to detect now-dead db_version caused by applying changes that completely overwrites them. For example: node A writes a record, node B fully updates the same record later on, making it so the old db_version produced by applying changes from node A are now absent from crsql_changes.

We have a way around this that involves periodically making a query to crsql_changes that counts groups of site_id, db_version and diffs with our in-memory map. This query is expensive, taking up a lot CPU time, the more rows there are the more time is spent there. I've already optimized the query to only look at one site_id at a time and only for db_versions that might have become absent. Still, it's way too much time spent in the query.

Ideally, we wouldn't need a periodic task and we'd be able to know which db_versions are now "invalidated" after inserting changes in crsql_changes. Calling, within a transaction, a function to determine that would be ideal! I don't like periodic tasks.

jeromegn avatar Aug 02 '23 14:08 jeromegn

Can you share how you're working around this currently? That way I can add a test for it so I don't break you guys.

tantaman avatar Aug 10 '23 15:08 tantaman

Yes, I get all "live" DB versions like this:

SELECT db_version FROM (
  SELECT DISTINCT(__crsql_db_version) AS db_version FROM foo__crsql_clock
  UNION
  SELECT DISTINCT(__crsql_db_version) AS db_version FROM bar__crsql_clock
  UNION
  -- ...
)

jeromegn avatar Aug 10 '23 16:08 jeromegn