flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-28185][Connector/Kafka] Make TimestampOffsetsInitializer apply offset reset str…

Open mas-chen opened this issue 3 years ago • 2 comments

…ategy and handle timestamps that do not map to an offset

What is the purpose of the change

This change improves the TimestampOffsetsInitializer to be initialized with a configured offset reset strategy. The default behavior (LATEST) is preserved. This also fixes a bug for when the timestamp does not correspond to an offset in Kafka and clarifies the exception message that is thrown.

Brief change log

  • Handles EARLIEST/LATEST/NONE
  • For timestamps that do not correspond to an offset in Kafka and if configured with NONE, the initializer will throw an explicit exception.

Verifying this change

This change added tests and can be verified as follows:

  • Added unit test to test the various offset reset strategies and the edge case where timestamp does not correspond to an offset in Kafka

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

mas-chen avatar Jul 27 '22 00:07 mas-chen

CI report:

  • 9ae1f5be5a48832c9703f087591602edd5c1b64b Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jul 27 '22 00:07 flinkbot

@PatrickRen can you help review? I can't request reviewers unfortunately :)

mas-chen avatar Aug 02 '22 05:08 mas-chen

I also encountered this exception. I read Kafka data in batches.Therefore, we want to use latest when the timestamp is exceeded.

LinMingQiang avatar Aug 18 '22 13:08 LinMingQiang

@mas-chen There are a couple of holidays coming up, but I've asked @PatrickRen for a review

MartijnVisser avatar Sep 29 '22 07:09 MartijnVisser

What's the status of this PR?

MartijnVisser avatar Oct 20 '22 11:10 MartijnVisser

@MartijnVisser @PatrickRen sorry for the delay, feedback should be addressed. Will followup with the public API change separately.

mas-chen avatar Oct 25 '22 09:10 mas-chen

@PatrickRen have a chance to take another look?

mas-chen avatar Nov 24 '22 06:11 mas-chen

@PatrickRen Thanks, the comments should be addressed! Regarding your earlier comment about the Public API change originally in this PR, I have filed https://issues.apache.org/jira/browse/FLINK-30200 and will followup later

mas-chen avatar Nov 24 '22 20:11 mas-chen