paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] Support periodically refresh existing dataFileMetas in CompactManager

Open xiangyuf opened this issue 10 months ago • 4 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Motivation

It is very usual user will use a streaming job and a batch job to update different columns of the same paimon table. User can choose to configure the batch job as write-only or streaming job as write-only to avoid conflict.

In either way, the write and compact job will not receive the new files generated by write-only job. This will lead to insufficient compaction both for MergeTreeCompactManager and BucketedAppendCompactManager.

This issue could be addressed by supporting periodically refresh dataFileMetas in CompactManager.

By doing this, there is no need to introduce an extra dedicated compaction job.

Solution

Same solution like the refreshFiles in LocalTableQuery

Anything else?

No response

Are you willing to submit a PR?

  • [x] I'm willing to submit a PR!

xiangyuf avatar Mar 05 '25 08:03 xiangyuf

Hi, can I try it?

xiedeyantu avatar Mar 08 '25 03:03 xiedeyantu

@xiedeyantu thx for volunteering, I'm already working on this.

xiangyuf avatar Mar 08 '25 12:03 xiangyuf

@xiedeyantu thx for volunteering, I'm already working on this.

OK, thanks for reply, and there are some easy issue to try?

xiedeyantu avatar Mar 08 '25 13:03 xiedeyantu

@xiedeyantu thx for volunteering, I'm already working on this.

OK, thanks for reply, and there are some easy issue to try?

Maybe you can try this. https://github.com/apache/paimon/issues/4244

xiangyuf avatar Mar 08 '25 14:03 xiangyuf