clickhouse-docs
clickhouse-docs copied to clipboard
ProtobufList documentation
Company or project name
No response
Describe the issue
G'day,
I'm not sure this is a issue, but maybe the documentation for protobuflist seems to be missing "repeated"?
https://clickhouse.com/docs/en/interfaces/formats#protobuflist protobuflist
syntax = "proto3";
message Envelope {
message MessageType {
string name = 1;
string surname = 2;
uint32 birthDate = 3;
repeated string phoneNumbers = 4;
};
MessageType row = 1; // <------------ NOT "repeated" ???
};
The original PR https://github.com/ClickHouse/ClickHouse/pull/35152 has "repeated". protobufListPR
// schemafile.proto
message Envelope {
message MessageType {
uint32 colA = 1;
string colB = 2;
}
repeated MessageType mt = 1; // <----------------- "repeated"
}
The original issue that started this also has "repeated". issue16436
It would also be amazing if the protobuflist docs linked to a working example.
Maybe this is an example, but it doesn't have "row"? https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/queries/0_stateless/format_schemas/02240_protobuflist_format_persons_syntax2.proto#L1
This is another example, and it does have "repeated". This example is also confusing, because it's unclear if the message type "Person" needs to be repetitively defined. To compile this protobuf, you do NOT need to define "Person" twice, but maybe you do for Clickhouse? https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/queries/0_stateless/format_schemas/02240_protobuflist1_format_persons.proto#L68
:) oh great. The integration test doesn't cover ProtobufList. Maybe this is why I'm struggling. Maybe it doesn't work anymore? https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/integration/test_storage_kafka/test_produce_http_interface.py#L137 https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/integration/test_storage_kafka/test_produce_http_interface.py#L172 ...
While I'm here the Protobuf docs reference to some random GEODE project, but then doesn't really answer the question about how to add the length delimiting
See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages).
It might improve the docs to link directly to the golang protodelim MarshalTo protodelim.MarshalTo https://pkg.go.dev/google.golang.org/[email protected]/encoding/protodelim#MarshalTo
Thanks
Additional context
No response
Hi @randomizedcoder, thanks for opening this issue! Examples in formats is something we recognise as needing improvement (in some cases there aren't even examples) that we will be tackling in the near future, so I will add this to the list of formats we need to look at.
@Blargian Thanks for the reply. Do you have any reason to believe ProtobufList works?
@Blargian Thanks for the reply. Do you have any reason to believe ProtobufList works?
@antaljanosbenjamin maybe you could help us out here?
I think @rschu1ze is the best to answer this, however I will try to answer it.
I think the fields in the Envelope message doesn't matter. We have tests that uses ProtobufList without any fields, just as a submessage.
I think it depends on the message is embedded in the message Envelope and it doesn't care about the fields. In the code we use the FindMessageTypeByName, which doesn't care about the fields at all.
Now let's answer the questions:
I'm not sure this is a issue, but maybe the documentation for protobuflist seems to be missing "repeated"?
No. The example should work, you can even delete the declared field as long as you keep the embedded message. However because of the extra field, it is super confusing, so it definitely needs improvements, thus thanks for bringing this up!
Maybe this is an example, but it doesn't have "row"?
Yes, that works as our serializer doesn't care about the fields.
To compile this protobuf, you do NOT need to define "Person" twice, but maybe you do for Clickhouse?
The message is only duplicated to make our test scripts simpler.
Do you have any reason to believe ProtobufList works?
Based on the tests I have high confidence in ProtobufList.
I wrote this code three years ago and I don't remember the specifics anymore, sorry.
Linked issue: https://github.com/ClickHouse/ClickHouse/issues/78746