paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] Ignore tag.creation-delay with a new spark-sql option when calling trigger_tag_automatic_creation

Open JackeyLee007 opened this issue 2 months ago • 0 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Motivation

The procedure trigger_tag_automatic_creation was designed to force creating an auto-tag after the right tag time. For example, if called after hour 0, an auto tag should be created for a daily auto-tagged table which has not been auto-tagged with the name of yesterday.

But it doesn't work under some conditions

  • No snapshot exits since the auto-tag time;
  • The calling time is before the delay time.

The reason is that

  1. Spark is not like Flink which could commit an empty message and make a new snapshot, so when triggered there may be not an appropriate snapshot for tag.
  2. Even if there is a snapshot since the auto-tag time, the TagAutoCreation may still refuse to create since it needs a snapshot after the delay time.

For example

  1. T is daily tag table with creation-delay=10m
  2. spark-sql call this procedure at 2025-10-22T00:02:00,
  3. if there is no snapshot between 00:00:0 ~ 00:02:00, creating no auto-tag
  4. if there is a snapshot at 00:01:00, but it's earlier than the delay time, still creating no auto-tag.

Solution

  1. Like Flink, spark will also commit empty message to make sure a snapshot when calling trigger procedure.
  2. Set up a new spark-sql conf option, like
set `spark.paimon.trigger-tag-auto-creation-ignore-delay` = true;

if the option is true, creat an auto tag even if the snapshot is earlier than the delay.

Anything else?

No response

Are you willing to submit a PR?

  • [x] I'm willing to submit a PR!

JackeyLee007 avatar Oct 22 '25 05:10 JackeyLee007