hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-9528] Support database and table name for Glue/ Datahub catalog

Open vineethNaroju opened this issue 5 months ago • 2 comments
trafficstars

Change Logs

Added separate configs for glue and datahub to set database/table name in sync client.

Impact

Hudi database/table name can be configured for glue/datahub catalog separately.

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

hoodie.datasource.meta.sync.glue.database_name: "database"
hoodie.datasource.meta.sync.glue.table_name: "table"

hoodie.meta.sync.datahub.database.name: "database"
hoodie.meta.sync.datahub.table.name: "table"

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

vineethNaroju avatar Jun 12 '25 13:06 vineethNaroju

https://issues.apache.org/jira/browse/HUDI-9528 @vineethNaroju Use this JIRA for your PR.

vinishjail97 avatar Jun 17 '25 05:06 vinishjail97

@hudi-bot run azure

vineethNaroju avatar Jun 25 '25 16:06 vineethNaroju

Added separate configs for glue and datahub to set database/table name in sync client.

@vineethNaroju can you explain why we need a new options key for the db/table name even though the existing options already work?

danny0405 avatar Jun 26 '25 01:06 danny0405

Added separate configs for glue and datahub to set database/table name in sync client.

@vineethNaroju can you explain why we need a new options key for the db/table name even though the existing options already work?

@danny0405 We support database/table names being different for other catalogs/metastores like BigQuery for example. The restriction for user right now is that for Glue/DataHub, it always gets created with hoodie.table.name https://github.com/apache/hudi/blob/master/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java#L83

vinishjail97 avatar Jun 26 '25 02:06 vinishjail97

it always gets created with hoodie.table.name

@vinishjail97 I have no access to the link, I see there are already some options like hoodie.gcp.bigquery.sync.table_name in the BigQuerySyncConfig on master: https://github.com/apache/hudi/blob/f1faabe2f577d7f33fdb0194a490e7c18b22546c/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java#L83

danny0405 avatar Jun 26 '25 03:06 danny0405

it always gets created with hoodie.table.name

@vinishjail97 I have no access to the link, I see there are already some options like hoodie.gcp.bigquery.sync.table_name in the BigQuerySyncConfig on master:

https://github.com/apache/hudi/blob/f1faabe2f577d7f33fdb0194a490e7c18b22546c/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java#L83

Yes, we want to have similar config for glue and datahub catalog.

vineethNaroju avatar Jun 26 '25 03:06 vineethNaroju

Yes, we want to have similar config for glue and datahub catalog.

That's okay, can we add similiar inference logic just in the config option so that we only need to change the specific sync tool:

public static final ConfigProperty<String> BIGQUERY_SYNC_TABLE_NAME = ConfigProperty
      .key("hoodie.gcp.bigquery.sync.table_name")
      .noDefaultValue()
      .withInferFunction(cfg -> Option.ofNullable(cfg.getString(HOODIE_TABLE_NAME_KEY))
          .or(() -> Option.ofNullable(cfg.getString(HOODIE_WRITE_TABLE_NAME_KEY))))
      .markAdvanced()
      .withDocumentation("Name of the target table in BigQuery");

danny0405 avatar Jun 26 '25 07:06 danny0405

yes, I agree. we should have inference logic. if catalog specific db and table names are overridden, we can take it from there. if not, we should fallback to the generic db and table name. I will work on addressing the feedback

nsivabalan avatar Aug 26 '25 18:08 nsivabalan

hey @danny0405 : patch is ready for review.

nsivabalan avatar Aug 26 '25 21:08 nsivabalan

@danny0405 : can you review this patch.

nsivabalan avatar Aug 27 '25 13:08 nsivabalan

CI report:

  • 680de0da1a0fff4b1313e71dc3a0462402837765 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Aug 28 '25 12:08 hudi-bot

Landed as part of https://github.com/apache/hudi/pull/13785

nsivabalan avatar Aug 28 '25 22:08 nsivabalan