stellar-core icon indicating copy to clipboard operation
stellar-core copied to clipboard

Meta versioning system

Open dmkozh opened this issue 10 months ago • 0 comments

This is the proposal for managing the tx meta changes in Core. This proposal is not necessary to implement in case if meta remains relatively stable, with infrequent updates on protocol boundaries. But if we find ourselves modifying meta in-between versions too often (e.g. to provide better data for analysis, improve diagnostic events, or support block explorers), then some sort of versioning system might be needed as to give the downstream systems time to adapt to the changes, while also not blocking the change until everyone is ready.

In any scenario, the meta changes should be compatible with the existing XDR schema (e.g. we may never add a field to a struct directly without using an extension/new union variant). Thus this only discusses only the changes to what Core emits.

There are the following broad categories of the meta changes:

  • Incremental changes - updates to the meta that are small or important enough to be emitted by every Core instance (or every instance that emits the respective optional section). E.g. adding a small extension, or add a new meta frame type with a few new fields.
  • Optional sections - large new meta sections that are only necessary for some Core instances. For example, Soroban diagnostic events are not necessary for the Horizon Core instances.
  • Diagnostic events - these have the 'meta'-schema - while the XDR stays exactly the same, the contents of the events might change (e.g. we might add new diagnostic events, or modify the existing ones. Changes to diagnostic events can be semantically divided into 'low-risk' and 'risky' changes:
    • 'Low-risk' is based on the fact that diagnostic events are meant to be pretty flexible and updateable. While consumers could be easily broken due to arbitrary hardcoded logic (e.g. if they expect a certain event to appear at a certain index), a sane implementation must at least be indifferent to the event order, count and ignore unknown events. That's a reasonable expectation given that arbitrary diagnostic events can be produced by the contracts themselves (via logging). Thus 'low-risk' changes would include the incremental changes (like adding a new event), or modification of purely diagnostic events (such as enhanced/updated host error message).
    • 'Risky' changes are the remaining changes, such as removing an existing event (e.g. removing metered CPU instructions), or modifying a 'system' diagnostic event (e.g. modifying trace events we emit during the function calls)

The proposal is to handle these categories in the following way:

  • Low-risk diagnostic event changes can be added in any release.
  • Optional sections are guarded by a permanent Core configuration flag, as these are meant to be configurable. This is what we have do for diagnostic events currently and this seems like the only correct approach.
  • Incremental changes and 'risky' diagnostic event changes are all guarded by a single configuration parameter, TX_META_VERSION (we could have two separate versions for these, but that's likely an overkill)
    • Changes should be accompanied with the respective meta version guard.
    • Ideally, every release should bump meta version by at most 1 as to avoid redundant 'jumps in versions. Think of this as a mini protocol version.
    • Core should support all the meta versions starting with some 'minimum' version hardcoded into every build
    • Minimum meta version is bumped either at the protocol boundaries, or after giving the downstream consumers sufficient time to adapt
    • Meta version bumps and changes related to these are announced as a part of release notes (and via discord channels, when relevant/useful)

dmkozh avatar Apr 19 '24 23:04 dmkozh