flink-connector-aws icon indicating copy to clipboard operation
flink-connector-aws copied to clipboard

[FLINK-29549]- Flink Glue Catalog integration

Open FranMorilloAWS opened this issue 8 months ago • 6 comments

Purpose of the change

For example: Implements the Table API for the Kinesis Source.

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment
  • Added unit tests
  • Manually verified by running the Kinesis connector on a local Flink cluster.

Significant changes

(Please check any boxes [x] if the answer is "yes". You can first publish the PR and check them afterwards, for convenience.)

  • [ ] Dependencies have been added or upgraded
  • [ ] Public API has been changed (Public API is any class annotated with @Public(Evolving))
  • [ ] Serializers have been changed
  • [ ] New feature has been introduced
    • If yes, how is this documented? (not applicable / docs / JavaDocs / not documented)

FranMorilloAWS avatar May 07 '25 10:05 FranMorilloAWS

Thanks for the contribution. A few points to note:

It looks like most of the code has already been reviewed across multiple PRs.

Concerns:

1. The current implementation exposes low-level Glue details (e.g., support for lowercase table names) directly through the Flink Catalog. Is this acceptable? I recommend raising this topic with the community. In my opinion, it would be better to encapsulate such Glue-specific behavior and avoid exposing it directly via the Flink Catalog interface. The catalog's behavior should remain consistent with other catalogs, with differences controlled via configuration only.

2. Please consider implementing the basic configuration options defined in [FLIP-277](https://cwiki.apache.org/confluence/display/FLINK/FLIP-277). If that’s not feasible in this PR, a fast follow-up would be valuable, especially for users relying on different credential modes.

3. It seems that some `.idea` folder files have been committed. Please remove them from the PR.

Cheers, Samrat

Hey Samrat. How could we encapsulate this specific issues? From user perspective we are already limiting them creating tables and databases with uppercase. In regards of the schema, we encapsulate storing the actual original columnNames in column parameters in Glue, so even though user sees their schema in glue with lower case (default for glue). we actually leverage the original column name.

  1. Yes right after we get the first release delivered we will add the remainder authentication, and partitions support and alter support

  2. Fixed!

FranMorilloAWS avatar May 14 '25 15:05 FranMorilloAWS

Hey Samrat. How could we encapsulate this specific issues? From user perspective we are already limiting them creating tables and databases with uppercase. In regards of the schema, we encapsulate storing the actual original columnNames in column parameters in Glue, so even though user sees their schema in glue with lower case (default for glue). we actually leverage the original column name.

Thanks for the detailed response.

Can you help me understand the technical complexity to support CaseSensitivity from GlueCatalog?

This is a deviation. I believe understanding how other connectors or catalogs handle mismatches, there can be a couple of ways to handle it

  1. Surfacing the mismatch to the end user with constraints
  2. Handling it in FlinkCatalog to provide consistent behaviour.

We can start a thread in the community to discuss this approach.

I am fine with either of the approaches as long as the community agrees on it.

Samrat002 avatar May 16 '25 08:05 Samrat002

@Samrat002 @FranMorilloAWS about case sensitivity, these are the logical rules I suggest:

  1. the actual case should preserved for all supported objects and used consistently in Flink interface, e.g. inCREATE TABLE... but also SHOW TABLE, SHOW COLUMNS ... etc. From a user's perspective, the way object names are actually stored in Glue should not be a concern.
  2. The additional metadata should be stored in Glue in a consistent way for all object types. Parameters seems to be available for Database, Tables, and Columns in Glue. A problem arises for Functions which don't have any Parameters if I am not mistaken.
  3. The additional metadata should not leak into Flink user interface. The "originalName" should not pop up in a SHOW TABLES or SHOW COLUMNS
  4. The fact Glue UI is limited and only shows lowercase names is not a concern of this component's

I would add that the way the additional metadata (the original name and any additional info required) is stored in Glue should be clearly documented, in case a user want's to build their own external interface to extract or update information in the Glue Catalog.

nicusX avatar May 28 '25 07:05 nicusX

Thanks for the contribution. A few points to note:

It looks like most of the code has already been reviewed across multiple PRs.

Concerns:

1. The current implementation exposes low-level Glue details (e.g., support for lowercase table names) directly through the Flink Catalog. Is this acceptable? I recommend raising this topic with the community. In my opinion, it would be better to encapsulate such Glue-specific behavior and avoid exposing it directly via the Flink Catalog interface. The catalog's behavior should remain consistent with other catalogs, with differences controlled via configuration only.

2. Please consider implementing the basic configuration options defined in [FLIP-277](https://cwiki.apache.org/confluence/display/FLINK/FLIP-277). If that’s not feasible in this PR, a fast follow-up would be valuable, especially for users relying on different credential modes.

3. It seems that some `.idea` folder files have been committed. Please remove them from the PR.

Cheers, Samrat

Hello @Samrat002 @nicusX. Using Database Parameters and Table Parameters we are now able to use lower/upper case for defining the Database and Table Name. By using the Show Tables/Show Databases, Describe table commands, we will show the original Flink Definition, even though in Glue UI it will be all in lower case

FranMorilloAWS avatar Jun 03 '25 15:06 FranMorilloAWS

Thanks for incorporating the changes. I will review the pr in next couple of days .

Samrat002 avatar Jun 06 '25 09:06 Samrat002

Can we refer to Paimon's solution for integrating Glue? https://github.com/apache/paimon/pull/4283/files#diff-534e1f558f31481abfd234e3b9c56c17a0988acf360a2264b0a090fa3a6aedd5

https://issues.apache.org/jira/browse/FLINK-38457

@FranMorilloAWS

melin avatar Oct 09 '25 01:10 melin

@melin. To me knowledge Paimon integration with Glue Data Catalog is to store Paimon Tables in Glue, however it wouldnt work for storing Kinesis, MSK, and other streaming sources. They have different implementations.

FranMorilloAWS avatar Dec 11 '25 10:12 FranMorilloAWS