clickhouse-docs icon indicating copy to clipboard operation
clickhouse-docs copied to clipboard

ProtobufList documentation

Open randomizedcoder opened this issue 10 months ago • 5 comments

Company or project name

No response

Describe the issue

G'day,

I'm not sure this is a issue, but maybe the documentation for protobuflist seems to be missing "repeated"?

https://clickhouse.com/docs/en/interfaces/formats#protobuflist protobuflist

syntax = "proto3";
message Envelope {
  message MessageType {
    string name = 1;
    string surname = 2;
    uint32 birthDate = 3;
    repeated string phoneNumbers = 4;
  };
  MessageType row = 1;                            // <------------ NOT "repeated" ???
};

The original PR https://github.com/ClickHouse/ClickHouse/pull/35152 has "repeated". protobufListPR

// schemafile.proto
message Envelope {
  message MessageType {
    uint32 colA = 1;
    string colB = 2;
  }
  repeated MessageType mt = 1;                 // <----------------- "repeated"
}

The original issue that started this also has "repeated". issue16436

It would also be amazing if the protobuflist docs linked to a working example.

Maybe this is an example, but it doesn't have "row"? https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/queries/0_stateless/format_schemas/02240_protobuflist_format_persons_syntax2.proto#L1

This is another example, and it does have "repeated". This example is also confusing, because it's unclear if the message type "Person" needs to be repetitively defined. To compile this protobuf, you do NOT need to define "Person" twice, but maybe you do for Clickhouse? https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/queries/0_stateless/format_schemas/02240_protobuflist1_format_persons.proto#L68

:) oh great. The integration test doesn't cover ProtobufList. Maybe this is why I'm struggling. Maybe it doesn't work anymore? https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/integration/test_storage_kafka/test_produce_http_interface.py#L137 https://github.com/ClickHouse/ClickHouse/blob/677b1cbfe76d0326df4b739bb0ad359af125ed87/tests/integration/test_storage_kafka/test_produce_http_interface.py#L172 ...

While I'm here the Protobuf docs reference to some random GEODE project, but then doesn't really answer the question about how to add the length delimiting

See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages).

It might improve the docs to link directly to the golang protodelim MarshalTo protodelim.MarshalTo https://pkg.go.dev/google.golang.org/[email protected]/encoding/protodelim#MarshalTo

Thanks

Additional context

No response

randomizedcoder avatar Jan 20 '25 01:01 randomizedcoder

Hi @randomizedcoder, thanks for opening this issue! Examples in formats is something we recognise as needing improvement (in some cases there aren't even examples) that we will be tackling in the near future, so I will add this to the list of formats we need to look at.

Blargian avatar Jan 22 '25 21:01 Blargian

@Blargian Thanks for the reply. Do you have any reason to believe ProtobufList works?

randomizedcoder avatar Jan 22 '25 21:01 randomizedcoder

@Blargian Thanks for the reply. Do you have any reason to believe ProtobufList works?

@antaljanosbenjamin maybe you could help us out here?

Blargian avatar Jan 23 '25 10:01 Blargian

I think @rschu1ze is the best to answer this, however I will try to answer it.

I think the fields in the Envelope message doesn't matter. We have tests that uses ProtobufList without any fields, just as a submessage.

I think it depends on the message is embedded in the message Envelope and it doesn't care about the fields. In the code we use the FindMessageTypeByName, which doesn't care about the fields at all.

Now let's answer the questions:

I'm not sure this is a issue, but maybe the documentation for protobuflist seems to be missing "repeated"?

No. The example should work, you can even delete the declared field as long as you keep the embedded message. However because of the extra field, it is super confusing, so it definitely needs improvements, thus thanks for bringing this up!

Maybe this is an example, but it doesn't have "row"?

Yes, that works as our serializer doesn't care about the fields.

To compile this protobuf, you do NOT need to define "Person" twice, but maybe you do for Clickhouse?

The message is only duplicated to make our test scripts simpler.

Do you have any reason to believe ProtobufList works?

Based on the tests I have high confidence in ProtobufList.

antaljanosbenjamin avatar Jan 23 '25 16:01 antaljanosbenjamin

I wrote this code three years ago and I don't remember the specifics anymore, sorry.

rschu1ze avatar Feb 09 '25 19:02 rschu1ze

Linked issue: https://github.com/ClickHouse/ClickHouse/issues/78746

randomizedcoder avatar Apr 07 '25 01:04 randomizedcoder