weave icon indicating copy to clipboard operation
weave copied to clipboard

chore(weave): Make digest calculation stable across client and server by stripping refs of non-content data

Open tssweeney opened this issue 2 months ago • 2 comments

This PR will change the digest calculation for many objects - which is certainly not something to do lightly. However, paying this one-time cost will mean that digest calculation can be stable across the client and server without any shared information. This is really critical. The downside is that after landing this, users who have logged data to the server before will experience a "version bump" in many objects that don't actually experience a version change. This is quite dramatic, but frankly, happens quite often anyway - due to ops' code changing.

Ok, so why are we doing this and what does it help?

  1. A big problem with our system today is that clients cannot safely calculate the digest of objects and must wait for a response from the server. The implication of this is that our clients have overly complex sending queues and Future dependency chains that cause performance issues and block our ability to move to a client/sender/server architecture.
  2. Ok, so why can't the client calculate the digest? This is because the server mutates refs before storing the data - we convert "external" refs (weave:///entity/project/...) format into the internal refs format (weave-internal:///project_id/...). Internal refs are stable across name changes and project migrations - making them the better choice for DB storage. However, the client is not aware of this. When a ref is embedded inside of an object, therefore, the resulting stringified json dump is materially different, and therefore deviates.
  3. Ok, so idea 1: let's give the client the project_id up front and have it construct internal ids! This would work, but is substantially more complex for a number of reasons:
    1. It requires a network exchange before logging can commence. This is generally fine, but does disallow any future of offline mode
    2. Since a client does not know if the project already exists, we would be invoking a lot more project upserts for parallel workloads, which has already presented a problem for customers
    3. It is more confusing for the users who want to integrate with out API
    4. It requires a lot more work on the server to support this.
    5. It is bleeding internal logic to the client - why does the client need to know about internal implementation details as to how refs are stored in the DB?! this is bad design.
  4. Instead, we can modify the digest calculation itself. We can essentially strip off the weave:///entity/project/ and weave-internal:///project_id/... prefixes from refs. In doing so we create much more stable digests. In fact, we really should cut off everything up to the digest (which would be even more stable). The fact that a piece of data belongs to project X does not matter when calculating content-addressed digests. The content is the same no matter what!

TODO:

  • [ ] A lot more tests
  • [ ] fix old tests
  • [ ] Finalize names and strings
  • [ ] Decide whether to include the table/name portion of the ref

tssweeney avatar Nov 12 '25 23:11 tssweeney

Codecov Report

:x: Patch coverage is 96.96970% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...race_server/client_server_common/digest_builder.py 97.40% 1 Missing and 1 partial :warning:
...ve/trace_server/clickhouse_trace_server_batched.py 90.00% 1 Missing :warning:

:loudspeaker: Thoughts on this report? Let us know!

codecov-commenter avatar Nov 13 '25 11:11 codecov-commenter