hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-8090] Add new zookeeper based lock provider

Open Davis-Zhang-Onehouse opened this issue 1 year ago • 1 comments

Added a new lock provider which is exactly the same as existing zookeeper based one except the way the partition key part is handled. To be specific:

  • It requiresTable name as indicated by "hoodie.table.name".
  • It requiresTable base path "hoodie.base.path".
  • The zookeeper base path key (hoodie.write.lock.zookeeper.base_path) is automatically derived as /tmp/-. To check the exact value, please check relevant log from class org.apache.hudi.client.transaction.lock.ZookeeperBasedImplicitBasePathLockProvider.
  • The zookeeper lock key config (hoodie.write.lock.zookeeper.lock_key) uses a hard coded value "lock_key"
  • It will ignore value set by hoodie.write.lock.zookeeper.base_path and hoodie.write.lock.zookeeper.lock_key.
  • It is also found that for zookeeper "hoodie.write.lock.zookeeper.url" is not consumed by zookeeper lock provider class of hudi and the actual usage of those properties implies hoodie.write.lock.zookeeper.port to have everything, which is in the form of <IP1>:<Port1>,<IP2>:<Port2>,.... For more details please refer https://app.clickup.com/t/18029943/ENG-13154.

    The solution is we change the official doc to not mention the "port" config. Also update the "url" config explicitly with the formate mentioned above. This does not have any implications because if any existing user has successfully configured zookeeper, they must have already figured out the right format themselves, otherwise giving "url" with value that only contains <IP> will error out.

    Change Logs

    Add new lock provider implementation. For code shared by the new-old lock provider, they are extracted to a base class.

    Unit test

    Impact

    None

    Risk level (write none, low medium or high below)

    None

    Documentation Update

    In https://hudi.apache.org/docs/concurrency_control/, we need to add description for the newly added lock provider.

    In https://hudi.apache.org/docs/concurrency_control/, we need to add description for the newly added lock provider.

    Remove hoodie.write.lock.zookeeper.port as all it serves is confusing the doc reader that hudi populates it to the rest of the system.

    Update hoodie.write.lock.zookeeper.url description that the expected format is comma separated <IP>:<URL> strings like <IP1>:<Port1>,<IP2>:<Port2>

    Contributor's checklist

    • [ ] Read through contributor's guide
    • [ ] Change Logs and Impact were stated clearly
    • [ ] Adequate tests were added if applicable
    • [ ] CI passed

Davis-Zhang-Onehouse avatar Aug 16 '24 21:08 Davis-Zhang-Onehouse

CI report:

  • 5a64bdc730a94007a8916d348de26345b41be207 UNKNOWN
  • 44dece3a83d8680b227c9a55a0b643defe7cc915 UNKNOWN
  • 2d3860184fc0d03fe388263237911bc73f9b5715 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Sep 15 '24 00:09 hudi-bot