corrosion
corrosion copied to clipboard
Buckets-based hashing for gentler resource utilization
Corrosion presently makes full table scans to hash the contents of everything to be able to compare consistency between actors.
This is an expensive operation and it gets more and more expensive.
The new plan would be to create buckets based on primary keys hashes:
- When Corrosion has written changes, hash all the unique primary keys, down to a
u64 <hash> - (<hash> % <buckets size>)to determine the bucket "id"- Store the ID in a table linking table name, primary key and bucket ID.
- Queue a background job to update the bucket's hash
- Hash all bucket hashes periodically (or maybe in-place with XOR?) to determine the full consistency hash
This has several benefits:
- No need to scan full tables
- Should still provide a valid, consistent, hash across nodes
- Can be done continuously
I believe this depends on https://github.com/vlcn-io/cr-sqlite/pull/344 so the pk <-> bucket ID table doesn't weigh too much. We'd be able to use the cr-sqlite-encoded primary keys which are varints and therefore a lot smaller.
Update: this should be implemented in cr-sqlite directly and feature-flagged or something like that.