Ruby should embed a serialized descriptor into generated code instead of using the DSL
Currently our generated code for Ruby uses a DSL to specify protos, eg:
Google::Protobuf::DescriptorPool.generated_pool.build do
add_file("google/protobuf/timestamp.proto", :syntax => :proto3) do
add_message "google.protobuf.Timestamp" do
optional :seconds, :int64, 1
optional :nanos, :int32, 2
end
end
end
This provides some readability, but has a high cost: any options that have not been specifically implemented in the DSL are dropped. This includes both built-in options like packed=true (https://github.com/protocolbuffers/protobuf/issues/2294) as well as all custom options(https://github.com/protocolbuffers/protobuf/issues/1198).
To solve these problems, the generated code should embed a serialized descriptor. This will naturally preserve all options, and is easy to implement. eg.
descriptor_data = File.binread(__FILE__).split("\n__END__\n", 2)[1]
Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data)
__END__
google/protobuf/descriptor.proto^R^Ogoogle.protobuf"G
^QFileDescriptorSet^R2
^Dfile^X^A ^C(^K2$.google.protobuf.FileDescriptorProto"ì^C
^SFileDescriptorProto^R^L
^Dname^X^A ^A(>-^R^O
^Gpackage^X^B ^A(>------^R^R
<snip>
This does make the generated code less readable though. So we should not do this until we are generating .rbs files: https://github.com/protocolbuffers/protobuf/issues/9495
Then we will have a world where:
-
_pb.rbsprovides a readable description of the protobuf interface -
_pb.rbprovides a compact, efficient description of the message, optimized for completeness and fast runtime loading.
Blocked by: https://github.com/protocolbuffers/protobuf/issues/9495
Is it worth unblocking this by ignoring the rbs piece of this problem?
- Who is currently reading the generated code and expecting it to be readable?
- They need to understand how the DSL generates ruby classes
- Whereas if they just read the proto directly, they need to understand how it maps to ruby
- In either case, you need to understand some (very simple) mapping.
- Seems like the proto -> ruby should be the default here. People interacting with the protos need to know proto anyway...
Could RBS be considered orthogonal? Fixing the generated code now, while backsliding on readability (but, per above, is that actually negative?), provides real benefits (like being able to use get proto options, which has been an ask for 6.5 years...).
Readability can then be "fixed" later with RBS.
Edit: Also, fwiw, the DSL also describes / is much closer to the proto itself, not the ruby classes, where as RBS would describe ruby. So the difference between the proto and DSL is super minimal today (imo), meaning I'm not sure there is a readability backslide at all... reading the DSL today is basically reading the proto...
Fixed in https://github.com/protocolbuffers/protobuf/commit/bd52d0483987f1a5186fc3daa261d1d76a787bcf
Announced in: https://protobuf.dev/news/2023-04-20/