flink icon indicating copy to clipboard operation
flink copied to clipboard

[DRAFT][FLINK-34440][formats][protobuf-confluent] add support for protobuf-confluent

Open anupamaggarwal opened this issue 11 months ago • 7 comments

What is the purpose of the change

This PR adds the following

  • Support for debezium protobuf format
  • Option to specify schemaId to use for serialization/deserialisation explicitly
  • Fall back to inferring schema from encoded message (deserialization) / flink rowType (serialization) in case schemaId is not specified
  • Handling for message indexes

Brief change log

  • Handles debezium envelop by setting up the schema and delegating the actual ser/de to Proto(Se|Dese)rializationSchema
  • SchemaCoder to abstract away schema encoding / decoding logic.
  • Handles message indexes

In progress items

  • Cleaning todos
  • Adding wrapper module similar to flink-sql-avro-registry
  • Schema validation during initialization

Enhancements

  • Evolving schema in compatible way
  • Refactoring common code between avro-confluent and protobuf-confluent modules

Verifying this change

UTs corresponding to most of the functionality

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (TBD)
  • The serializers: (yes)
  • The runtime per-record code paths (performance sensitive): (yes)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (tbd)

anupamaggarwal avatar Mar 11 '24 13:03 anupamaggarwal

CI report:

  • d0f410636fcf608fecbf530e8dbe6be3ef3e7d02 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Mar 11 '24 14:03 flinkbot

👋 Thanks for working on this Anupam. Any updates on this PR?

klam-shop avatar May 07 '24 20:05 klam-shop

Hi @klam-shop, apologies, I missed your comment earlier. I am sorry for leaving this PR hanging in the middle, I had to context switch to focus on some other priorities. I might be able to look into this again in a couple of months.

anupamaggarwal avatar Jun 13 '24 12:06 anupamaggarwal

Thanks @anupamaggarwal, if we have capacity, would you also be open to others picking up this PR? Not sure if we can commit time to the work yet.. but wanted to check with you before we hijack your PR 😄

klam-shop avatar Jul 12 '24 20:07 klam-shop

Thanks @anupamaggarwal, if we have capacity, would you also be open to others picking up this PR? Not sure if we can commit time to the work yet.. but wanted to check with you before we hijack your PR 😄

sure @klam-shop 👍 , pls feel free to pick this up in case you get to this before me :)

anupamaggarwal avatar Jul 15 '24 18:07 anupamaggarwal

I started working on an alternative implementation over here: https://github.com/Shopify/flink/tree/protobuf-confluent-dynamic-deser

dmariassy avatar Jul 19 '24 18:07 dmariassy

~Draft~ PR for the alternative implementation: https://github.com/apache/flink/pull/25114

dmariassy avatar Jul 23 '24 19:07 dmariassy