Unique OCaml module for all messages
Hi, Thanks for your work on this project.
This is a feature request more than an actual issue.
tldr: Would it be possible to modify the OCaml generation to optionally emit a single OCaml module for all messages in a .proto file rather than a separate module for each of the messages?
A more reasoned request follows. Messages can be structured to follow a hierarchy, as in this example:
message A {
bool is_something = 1;
repeated B bs = 2;
}
message B {
bool is_something_else = 1;
oneof content {
C c = 2;
D d = 3;
}
}
message C { ... }
message D { ... }
Especially in such cases, it would be useful to optionally have in output just an OCaml module with a type mirroring such a hierarchy, approximately of the form:
type a = {
is_something: bool;
bs: b list;
}
and b = {
is_something_else: bool;
content: [ `not_set | `C of c | `D of d ];
}
and c = { ... }
and d = {....}
Such an OCaml type hierarchy allows for easier navigation of the type structure than the current one. In particular, it would be easier to write a ppx for doing something on such a solution than the current one.
Does this sound reasonable? How hard would it be to make such a change?
I think its impossible. In terms of effort its huge even to try to accomplish this.
There a too many corner-cases that would make the type names less predictable and in some cases almost impossible to construct types without adding type annotation on almost every function.
The aim of ocaml-protoc-plugin is to provide a fully compliant implementation. Also a design goal is to have as predicable name mapping as possible and not have the names change when making additions to the proto files (i.e. adding a new message type in extend to existing ones should not incur changes to existing name mappings).
Have you looked at the types generated by ocaml-protoc? It does not use modules for generated types and may be closer in terms of type generation to what you are looking for.
In particular, it would be easier to write a ppx for doing something on such a solution than the current one.
Could you provide an example of a ppx that would be eaiser to write?
Namespace pollution
Enclosing each message definition in a module allows helper functions to be nicely scoped as well as solving resolution problems.
Imagine the following definition
message A {
bool x;
bool y;
}
message B {
int32 x;
int32 y;
}
would then translate to
type a = { bool x; bool y; }
and b = { int x; int y; }
But it would be difficult to construct either as the field names occupy the same namespace, and to need explicit type annotations on each use to distinguish between type a or b.
Sub-messages
Sub messages would be difficult to implement. Consider the following
message A {
message B {
bool a;
}
B b;
}
message B {
message B {
int32 a;
}
B b;
}
This would not allow for a trivial name mapping - any the type hierarchy would need to be encoded in the name, which I feel is less intuitive, e.g.
type a = ...
and type a_b = ...
and type b = ...
and type b_b = ...
Which would even break when you name name conflicts (i.e. a message called B_b).
There are so many cornercases that would need to be handled, but I think it to boils down to how to create unique names in a shared namespace that are predicable and usable - and I don't see any viable path to solve this that does not place restrictions on what constructs / names can be handled.
Thanks for your thorough reply.
I do not think the task is impossible, but sure enough consequent in terms of effort.
I share your opinion: it boils down to having a naming scheme/protocol of unique names. Again, quite an effort, but not impossible imho.
Let me reply to your questions:
Have you looked at the types generated by ocaml-protoc? It does not use modules for generated types and may be closer in terms of type generation to what you are looking for.
I had a look at it. It may be fine from the OCaml types point of view, but I do not like the architecture/idea of the project. I like more this project's idea of a plugin for protoc. Plus, it does not support things that I quite need.
Could you provide an example of a ppx that would be eaiser to write?
I have not such an example. Sure enough though, a ppx that just needs to deal with navigating simple types is simpler than one having to also deal with modules.
Knowing that
The aim of ocaml-protoc-plugin is to provide a fully compliant implementation. Also a design goal is to have as predicable name mapping as possible and not have the names change when making additions to the proto files
let me rephrase my request: Would you be willing to accept a merge request for such a change? Would you be available to discuss and guide during the development of such a change?
Again, it would be an optional behavior, not the default one. Also, I am not telling that I have such a MR ready, nor that I have time to work on it at the moment.
Its of course possible to create a flat type mapping. What I meant was that its (almost) impossible given the current architecture of the plugin - as well as very very large effort - and I think a lot of compromises will need to be made, and I'm unsure if will be worth while.
let me rephrase my request: Would you be willing to accept a merge request for such a change? Would you be available to discuss and guide during the development of such a change?
I'd be happy to discuss how this can be done and offer guidance, but I cannot guarantee that I will merge a PR that may significantly increase maintenance of the code.
Could you provide an example of a ppx that would be eaiser to write?
I have not such an example. Sure enough though, a ppx that just needs to deal with navigating simple types is simpler than one having to also deal with modules.
My question about PPX'es was to understand the motivation better. I don't think PPX'es are any more difficult to write just because types are placed in modules. But my question remains: Whats the motivation for this?
The design decision to create a module for every message type was indeed to simplify and be more Ocaml-idiomatic. Also I have been working with ocaml-protoc and did not really like the flat types (and also needed a fully compliant implementation).
Maybe you could try exemplifying what the signatures would look like - i.e. how would the "flat" signature look like for
syntax = "proto3";
import "google/protobuf/timestamp.proto";
package echo;
message Request {
google.protobuf.Timestamp timestamp = 1;
string who = 2;
}
message Reply {
string response = 1;
}
service Echo {
rpc Call (Request) returns (Reply);
}
Including various serialization and de-serialization functions as well as types for the rpc endpoint.
Hi. Thanks again for your reply.
But my question remains: Whats the motivation for this?
Essentially, think about a typical Abstract Syntax Tree (AST) for a programming language, where each construct with its metadata is encoded as a protocol buffer message. Hence, at the end, you end up with a hierarchy of messages. Typically, the latter admits a simple, flat, OCaml type description.
I'd be happy to discuss how this can be done and offer guidance, but I cannot guarantee that I will merge a PR that may significantly increase maintenance of the code.
I understand. Totally fair.
At this point, you may also close this issue, I guess.
But my question remains: Whats the motivation for this?
Essentially, think about a typical Abstract Syntax Tree (AST) for a programming language, where each construct with its metadata is encoded as a protocol buffer message. Hence, at the end, you end up with a hierarchy of messages. Typically, the latter admits a simple, flat, OCaml type description.
I'm still not sure what problem a simple recursive type definition is solving compared to a module hierarchy. Could provide a motivating example to show what will become easier? I'm genuinely interested in understanding.
Closing this due to inactivity, but feel free to reopen or followup on the issue.