vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Add new Protobuf-based MessageBus DocumentAPI protocol

Open vekterli opened this issue 1 year ago • 0 comments

@geirst please review schema/C++ code (primary focus on routable_factories_8.cpp), @jonmv please review schema/Java code (primary focus on RoutableFactories80.java and its codec design). Note that this commit by itself will fail its PR-build due to missing cross-language test files. This is intentional, as these are tied to the protocol version and will be added as a separate commit.

This adds an entirely new implementation of the internal MessageBus DocumentAPI protocol, which shall be functionally 1-to-1 compatible with the existing legacy protocol.

[!CAUTION] MUST NOT be merged before a concrete Vespa release version (in the future) has been assigned to the protocol

New protobuf schemas have been added to the top-level documentapi module, which are separated into different domains of responsibility:

  • CRUD messages
  • Visiting messages
  • Data inspection messages

As well as a schema for shared, common message types.

Both C++ and Java protocol implementations separate serialization and deserialization into a codec abstraction per message type, which hides the boilerplate required for Protobuf buffer management. The Java version is a tad more verbose due to generics type-erasure.

This protocol does not currently support lazy (de-)serialization in Java, as the existing mechanisms for doing so are inherently tied to the legacy protocol version. Performance tests will decide if we need to introduce such functionality to the new protocol version.

To avoid having the new protocol go live in production, this commit changes the semantics of how MessageBus version reporting works (at least for the near future); instead of reporting the current Vespa release version, it reports the highest supported protocol version. This lets us conditionally enable the new protocol by reporting a MessageBus version greater than or equal to the protocol version iff the protocol should be active. This is currently done with a dedicated (presumably short-lived) environment variable.

The new protocol is disabled by default.

Other changes:

  • Protocol tests have been moved up one package directory level to be aligned with the actual package of the classes they test. This allows for using package-protected constructors in the serialization tests.
  • DocumentDeserializer now exposes the underlying document type repo/manager. This is done to detangle Document/DocumentUpdate deserialization from the underlying wire buffer management.
  • RemoveLocationMessage at long last contains a bucket space, which was forgotten when we initially added this concept to the other messages, and where the pain of adding it in later was too big (not so anymore!).

Unit tests for both C++ and Java have been hoisted from the legacy test suite, cleaned up and extended with additional cases. The C++ tests use the old unit test kit and should receive a good follow-up washing and GTest-rewrite.

Important: due to how MessageBus protocol versioning works, the final protocol version is not yet decided, as setting it requires syncing against our build systems. A follow-up commit will assign the final version as well as include all required binary test files.

vekterli avatar Feb 16 '24 16:02 vekterli