cadence icon indicating copy to clipboard operation
cadence copied to clipboard

Demonstrate a way to get rid of the cadence-idl repo

Open Groxx opened this issue 2 years ago • 3 comments

What changed

Cadence-server proto type X now no longer conflicts with cadence-client proto type X, because the server package name is implicitly rewritten to a different namespace.

This intentionally allows client and server protobufs to diverge, e.g. when developing a new server API that is not yet ready for client-side use, or when using different code-generator configuration.

Why?

This is a proof-of-concept for solving the issue found in https://github.com/uber/cadence-idl/pull/94 by doing the opposite resolution.

There are definitely some things to like about centralizing code-gen... quite a lot actually... but it sometimes causes a lot of unnecessary friction. It requires client and server to update their (client-)APIs in lock-step, even though new APIs may not yet be ready for use, and that might force upgrades that are not yet ready.

It also doesn't accurately represent how RPC works - on-the-wire binary compatibility has literally nothing to do with code compatibility, but this (and gogoproto's global registration in general) ties them together as if they are identical. It also implies that literally all code generation for proto file X is identical to every other - the very fact that their code-gen is configurable makes that unambiguously wrong.

So let's just stop doing that.

Tests

E2E tests exercise both thrift (unaffected) and proto, but I should mechanically assert that grpc client + server, and both client->server and server<->server are tested.

Potential risks

Three of them probably:

  1. google.protobuf.Any, which I do not believe is an issue
  2. grpc/yarpc incompatibilities due to service name leakages
  3. code-gen changes that could require new monkey-patches in the future

google.protobuf.Any

This works by essentially encoding

message Any {
  string type = 1;
  binary contents = 2;
}

where the type value is selected by the creator of the Any when encoding, and checked by the recipient when decoding... which this now changes, so the client-type will not be .Any-compatible with the server-type.

There are simple ways to work around this, e.g. by looking up the type and manually decoding: https://github.com/gogo/protobuf/blob/f67b8970b736e53dbd7d0a27146c8f1ac52f74e5/types/any.go#L127 It's somewhat annoying that it has to be done, but entirely possible. And a reimplementation of ^ that method but with our customized comparison would be utterly trivial and would allow receiving Any essentially normally. Sending an Any from server to client is similarly trivial but slightly manual.

That said, personally I don't think we'll run into this. Any is almost never an accurate modeling of a domain: a field which contains literally any proto type, but not thrift or any other binary blob, and also not just protos we define and control, is... basically nonsense. Particularly for a multi-protocol system like Cadence. We can get the same capabilities without the limitations by using a custom Any that supports thrift/json/etc, and basically every other scenario has strictly more correct options (true request forwarding, a OneOf, etc).

grpc/yarpc issues

I believe E2E tests show this continues to work.

code-gen changes in the future

Inevitably annoying if this happens, but we pin our generator versions + yarpc's code-gen is very stable in practice. We can probably also convince them to allow modifying server names via config args, as in principle the proto files used have no technical relation whatsoever to how they are used - it's just commonly the same everywhere.

Follow-ups

Since this allows dropping cadence-idl use for server builds, where both types are used, we can also drop it in the client. This will need to be a v2 thing as it changes import paths, but it's not hard to re-introduce the necessary code-gen over there.

When fully complete though, this will turn into:

  • when you are the server contacting the server, use github.com/uber/cadence/... like any other server code
  • when you are using the client library to contact the server, use go.uber.org/cadence/... like any other client code

Which leaves things in a clear state for what depends on what, and at what version.

Groxx avatar Apr 07 '23 02:04 Groxx

aaaagh dangit:

2023-04-07T04:03:35.128Z	FATAL	loggerimpl/logger.go:143	Start workflow with err	{"error": "code:unimplemented message:unrecognized procedure \"uber.cadence.api.v1.WorkflowAPI::StartWorkflowExecution\" for service \"cadence-frontend\"", "logging-call-at": "client_integration_test.go:235"}

that's probably going to be fatal to this.

Groxx avatar Apr 07 '23 04:04 Groxx

Pull Request Test Coverage Report for Build 01875e2f-959c-4c4d-87af-1d7805759bcc

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 38 unchanged lines in 10 files lost coverage.
  • Overall coverage increased (+0.1%) to 57.178%

Files with Coverage Reduction New Missed Lines %
service/history/queue/timer_queue_processor_base.go 1 78.81%
service/history/queue/transfer_queue_processor_base.go 1 77.62%
common/peerprovider/ringpopprovider/config.go 2 81.58%
common/task/weightedRoundRobinTaskScheduler.go 2 89.64%
service/history/task/transfer_standby_task_executor.go 2 87.24%
service/matching/matcher.go 2 91.46%
common/persistence/nosql/nosqlplugin/cassandra/workflow.go 3 60.0%
service/history/execution/mutable_state_task_refresher.go 3 58.86%
common/persistence/nosql/nosqlplugin/cassandra/workflowUtils.go 8 78.16%
common/persistence/nosql/nosqlplugin/cassandra/workflowParsingUtils.go 14 87.16%
<!-- Total: 38
Totals Coverage Status
Change from base Build 01874e2d-13b5-4a33-b7c8-2d7da3311e7b: 0.1%
Covered Lines: 85396
Relevant Lines: 149351

💛 - Coveralls

coveralls avatar Apr 07 '23 20:04 coveralls

well. it requires rewriting both proto and generated code, because yarpc is insufficiently configurable, but it's pretty simple and it does seem to work.

Groxx avatar Apr 07 '23 21:04 Groxx