Add support for protobuf version 4.29.x
Describe the enhancement requested
Hi all, First of all let me start by saying Thank You 🙏
Please add support for protobuf major version 4 preferably 4.29.1
At December 2023 Protobuf java library added support for the Editions feature which resulted in major version change starting at version 4.26
I've noticed that a PR upgrading to version 4 was previously rejected https://github.com/apache/parquet-java/pull/1308.
The error below is a result of attempting to run version 1.15.0 with protobuf version 4.29.1 . the specific failure presented is the result of the change linked below. https://github.com/protocolbuffers/protobuf/commit/65c65c2d04b293225db150029d005056a9f078b8#diff-2228551d02c6661809ca7103db9512eef4c2d01f35556d42316543d92a89edef:~:text=Edition%20getEdition()%20%7B
java.lang.NoSuchMethodError: com.google.protobuf.Descriptors$FileDescriptor.getSyntax()Lcom/google/protobuf/Descriptors$FileDescriptor$Syntax;
at org.apache.parquet.proto.ProtoWriteSupport$MessageWriter.writeAllFields(ProtoWriteSupport.java:449)
at org.apache.parquet.proto.ProtoWriteSupport$MessageWriter.writeTopLevelMessage(ProtoWriteSupport.java:423)
at org.apache.parquet.proto.ProtoWriteSupport.write(ProtoWriteSupport.java:139)
at org.apache.parquet.proto.ProtoWriteSupport.write(ProtoWriteSupport.java:74)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:152)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:425
Many many thanks 🙏
Component(s)
Protobuf
@Fokko Is it safe to bump the major version of protobuf?
According to discussion on https://github.com/apache/parquet-java/pull/1308#issuecomment-2730336303 the parquet-java project would have to be upgraded to Java-11 before we could support protobuf version 4.x
Is there a ticket out there to update to Java 11?
We were able to upgrade to protobuf 4.32.1 and only had to replace ProtoWriteSupport.java with a patched version, using the patch below:
@@ -445,10 +445,14 @@
private void writeAllFields(MessageOrBuilder pb) {
Descriptor messageDescriptor = pb.getDescriptorForType();
- Descriptors.FileDescriptor.Syntax syntax =
- messageDescriptor.getFile().getSyntax();
+ String syntax =
+ messageDescriptor.getFile().toProto().getSyntax();
+ if ("editions".equals(syntax)) {
+ throw new UnsupportedOperationException("protocol buffers 'editions' not supported");
+ }
+ boolean isProto2 = !"proto3".equals(syntax);
- if (Descriptors.FileDescriptor.Syntax.PROTO2.equals(syntax)) {
+ if (isProto2) {
// Returns changed fields with values. Map is ordered by id.
Map<FieldDescriptor, Object> changedPbFields = pb.getAllFields();
@@ -464,7 +468,7 @@
int fieldIndex = fieldDescriptor.getIndex();
fieldWriters[fieldIndex].writeField(entry.getValue());
}
- } else if (Descriptors.FileDescriptor.Syntax.PROTO3.equals(syntax)) {
+ } else {
List<FieldDescriptor> fieldDescriptors = messageDescriptor.getFields();
for (FieldDescriptor fieldDescriptor : fieldDescriptors) {
FieldDescriptor.Type type = fieldDescriptor.getType();
@@ -797,4 +801,3 @@
throw new InvalidRecordException(exceptionMsg);
}
}
This code should actually also work with the protobuf 3 library. The problem was that in protobuf 4 the getFile().getSyntax() method is not there anymore, but getFile().toProto().getSyntax() works (returns a String instead of enum)
@uwemaurer Can you create a PR for this? Maybe they'll agree to merge your patch for the next release.
@uwemaurer That's great! Can you create a PR so you'll be credited for the work? We can easily test to make sure that it still works against Protobuf 3 by first doing the patch and then bumping protobuf in a separate PR.
yes I will prepare a PR