cadence
cadence copied to clipboard
Demonstrate a way to get rid of the cadence-idl repo
What changed
Cadence-server proto type X now no longer conflicts with cadence-client proto type X, because the server package name is implicitly rewritten to a different namespace.
This intentionally allows client and server protobufs to diverge, e.g. when developing a new server API that is not yet ready for client-side use, or when using different code-generator configuration.
Why?
This is a proof-of-concept for solving the issue found in https://github.com/uber/cadence-idl/pull/94 by doing the opposite resolution.
There are definitely some things to like about centralizing code-gen... quite a lot actually... but it sometimes causes a lot of unnecessary friction. It requires client and server to update their (client-)APIs in lock-step, even though new APIs may not yet be ready for use, and that might force upgrades that are not yet ready.
It also doesn't accurately represent how RPC works - on-the-wire binary compatibility has literally nothing to do with code compatibility, but this (and gogoproto's global registration in general) ties them together as if they are identical. It also implies that literally all code generation for proto file X is identical to every other - the very fact that their code-gen is configurable makes that unambiguously wrong.
So let's just stop doing that.
Tests
E2E tests exercise both thrift (unaffected) and proto, but I should mechanically assert that grpc client + server, and both client->server and server<->server are tested.
Potential risks
Three of them probably:
google.protobuf.Any, which I do not believe is an issue- grpc/yarpc incompatibilities due to service name leakages
- code-gen changes that could require new monkey-patches in the future
google.protobuf.Any
This works by essentially encoding
message Any {
string type = 1;
binary contents = 2;
}
where the type value is selected by the creator of the Any when encoding, and checked by the recipient when decoding... which this now changes, so the client-type will not be .Any-compatible with the server-type.
There are simple ways to work around this, e.g. by looking up the type and manually decoding: https://github.com/gogo/protobuf/blob/f67b8970b736e53dbd7d0a27146c8f1ac52f74e5/types/any.go#L127
It's somewhat annoying that it has to be done, but entirely possible. And a reimplementation of ^ that method but with our customized comparison would be utterly trivial and would allow receiving Any essentially normally. Sending an Any from server to client is similarly trivial but slightly manual.
That said, personally I don't think we'll run into this. Any is almost never an accurate modeling of a domain: a field which contains literally any proto type, but not thrift or any other binary blob, and also not just protos we define and control, is... basically nonsense. Particularly for a multi-protocol system like Cadence.
We can get the same capabilities without the limitations by using a custom Any that supports thrift/json/etc, and basically every other scenario has strictly more correct options (true request forwarding, a OneOf, etc).
grpc/yarpc issues
I believe E2E tests show this continues to work.
code-gen changes in the future
Inevitably annoying if this happens, but we pin our generator versions + yarpc's code-gen is very stable in practice. We can probably also convince them to allow modifying server names via config args, as in principle the proto files used have no technical relation whatsoever to how they are used - it's just commonly the same everywhere.
Follow-ups
Since this allows dropping cadence-idl use for server builds, where both types are used, we can also drop it in the client. This will need to be a v2 thing as it changes import paths, but it's not hard to re-introduce the necessary code-gen over there.
When fully complete though, this will turn into:
- when you are the server contacting the server, use github.com/uber/cadence/... like any other server code
- when you are using the client library to contact the server, use go.uber.org/cadence/... like any other client code
Which leaves things in a clear state for what depends on what, and at what version.
aaaagh dangit:
2023-04-07T04:03:35.128Z FATAL loggerimpl/logger.go:143 Start workflow with err {"error": "code:unimplemented message:unrecognized procedure \"uber.cadence.api.v1.WorkflowAPI::StartWorkflowExecution\" for service \"cadence-frontend\"", "logging-call-at": "client_integration_test.go:235"}
that's probably going to be fatal to this.
Pull Request Test Coverage Report for Build 01875e2f-959c-4c4d-87af-1d7805759bcc
- 0 of 0 changed or added relevant lines in 0 files are covered.
- 38 unchanged lines in 10 files lost coverage.
- Overall coverage increased (+0.1%) to 57.178%
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| service/history/queue/timer_queue_processor_base.go | 1 | 78.81% |
| service/history/queue/transfer_queue_processor_base.go | 1 | 77.62% |
| common/peerprovider/ringpopprovider/config.go | 2 | 81.58% |
| common/task/weightedRoundRobinTaskScheduler.go | 2 | 89.64% |
| service/history/task/transfer_standby_task_executor.go | 2 | 87.24% |
| service/matching/matcher.go | 2 | 91.46% |
| common/persistence/nosql/nosqlplugin/cassandra/workflow.go | 3 | 60.0% |
| service/history/execution/mutable_state_task_refresher.go | 3 | 58.86% |
| common/persistence/nosql/nosqlplugin/cassandra/workflowUtils.go | 8 | 78.16% |
| common/persistence/nosql/nosqlplugin/cassandra/workflowParsingUtils.go | 14 | 87.16% |
| <!-- | Total: | 38 |
| Totals | |
|---|---|
| Change from base Build 01874e2d-13b5-4a33-b7c8-2d7da3311e7b: | 0.1% |
| Covered Lines: | 85396 |
| Relevant Lines: | 149351 |
💛 - Coveralls
well. it requires rewriting both proto and generated code, because yarpc is insufficiently configurable, but it's pretty simple and it does seem to work.