paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Feature] Improve catalog lock for paimon

Open FangYongs opened this issue 1 year ago • 6 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Motivation

Currently paimon only support hive lock for hive catalog, it should support unified lock for filesystem catalog

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

FangYongs avatar Jan 31 '24 02:01 FangYongs

Great!I want this feature, too.

chenxi0599 avatar Feb 01 '24 10:02 chenxi0599

+1 for this.

legendtkl avatar Feb 02 '24 03:02 legendtkl

+1 thanks @FangYongs for driving!

JingsongLi avatar Feb 21 '24 09:02 JingsongLi

Hi, after reviewing https://github.com/apache/paimon/pull/3076

I feel more description should be provided here, otherwise it may lead to incorrect design by contributors.

  • What is the overall options API? Do option keys require a lock prefix?
  • How does FileSystemCatalog build Lock? How should JdbcLock handle without a jdbc lock context? Do we need to make some modifications to JDBCLock and HiveLock?

I suggest having a clear design before creating subtasks. What do you think? @FangYongs

JingsongLi avatar Mar 27 '24 02:03 JingsongLi

Thanks @JingsongLi , and I completely agree with your opinion! I will create a PIP for this feature and start a discussion thread later.

FangYongs avatar Mar 27 '24 05:03 FangYongs

@FangYongs @JingsongLi

  1. I think that the parameters for LockContext to be open to the public should only be 'option', so there is no need to reference the connection implementation between jdbc and hive in the Filesystem Catalog, which is the future ClientPool

2.I think the parameters required for lock should be distinguished by adding a prefix of "lock." This will not cause any inconvenience to users. However, metastores like jdbc and hive, which come with their own Lock implementation, need to be compatible and do not require redeclaration of lock specific configuration parameters

  1. The core issue is the issue of connection. If we need to create a new connection every time and close it after use, it is inevitable to have too many connections, and it may not be possible to seize them during busy times. Therefore, it is recommended to use cache to cache and set the eviction time

If possible, I can refactor according to the above description first.

sunxiaojian avatar Mar 27 '24 15:03 sunxiaojian

+1 for this

rmotapar avatar Jul 02 '24 23:07 rmotapar