[FLINK-29549]- Flink Glue Catalog integration
Purpose of the change
For example: Implements the Table API for the Kinesis Source.
Verifying this change
Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
- Added integration tests for end-to-end deployment
- Added unit tests
- Manually verified by running the Kinesis connector on a local Flink cluster.
Significant changes
(Please check any boxes [x] if the answer is "yes". You can first publish the PR and check them afterwards, for convenience.)
- [ ] Dependencies have been added or upgraded
- [ ] Public API has been changed (Public API is any class annotated with
@Public(Evolving)) - [ ] Serializers have been changed
- [ ] New feature has been introduced
- If yes, how is this documented? (not applicable / docs / JavaDocs / not documented)
Thanks for the contribution. A few points to note:
It looks like most of the code has already been reviewed across multiple PRs.
Concerns:
1. The current implementation exposes low-level Glue details (e.g., support for lowercase table names) directly through the Flink Catalog. Is this acceptable? I recommend raising this topic with the community. In my opinion, it would be better to encapsulate such Glue-specific behavior and avoid exposing it directly via the Flink Catalog interface. The catalog's behavior should remain consistent with other catalogs, with differences controlled via configuration only. 2. Please consider implementing the basic configuration options defined in [FLIP-277](https://cwiki.apache.org/confluence/display/FLINK/FLIP-277). If that’s not feasible in this PR, a fast follow-up would be valuable, especially for users relying on different credential modes. 3. It seems that some `.idea` folder files have been committed. Please remove them from the PR.Cheers, Samrat
Hey Samrat. How could we encapsulate this specific issues? From user perspective we are already limiting them creating tables and databases with uppercase. In regards of the schema, we encapsulate storing the actual original columnNames in column parameters in Glue, so even though user sees their schema in glue with lower case (default for glue). we actually leverage the original column name.
-
Yes right after we get the first release delivered we will add the remainder authentication, and partitions support and alter support
-
Fixed!
Hey Samrat. How could we encapsulate this specific issues? From user perspective we are already limiting them creating tables and databases with uppercase. In regards of the schema, we encapsulate storing the actual original columnNames in column parameters in Glue, so even though user sees their schema in glue with lower case (default for glue). we actually leverage the original column name.
Thanks for the detailed response.
Can you help me understand the technical complexity to support CaseSensitivity from GlueCatalog?
This is a deviation. I believe understanding how other connectors or catalogs handle mismatches, there can be a couple of ways to handle it
- Surfacing the mismatch to the end user with constraints
- Handling it in FlinkCatalog to provide consistent behaviour.
We can start a thread in the community to discuss this approach.
I am fine with either of the approaches as long as the community agrees on it.
@Samrat002 @FranMorilloAWS about case sensitivity, these are the logical rules I suggest:
- the actual case should preserved for all supported objects and used consistently in Flink interface, e.g. in
CREATE TABLE...but alsoSHOW TABLE,SHOW COLUMNS ...etc. From a user's perspective, the way object names are actually stored in Glue should not be a concern. - The additional metadata should be stored in Glue in a consistent way for all object types.
Parametersseems to be available for Database, Tables, and Columns in Glue. A problem arises for Functions which don't have anyParametersif I am not mistaken. - The additional metadata should not leak into Flink user interface. The "originalName" should not pop up in a
SHOW TABLESorSHOW COLUMNS - The fact Glue UI is limited and only shows lowercase names is not a concern of this component's
I would add that the way the additional metadata (the original name and any additional info required) is stored in Glue should be clearly documented, in case a user want's to build their own external interface to extract or update information in the Glue Catalog.
Thanks for the contribution. A few points to note:
It looks like most of the code has already been reviewed across multiple PRs.
Concerns:
1. The current implementation exposes low-level Glue details (e.g., support for lowercase table names) directly through the Flink Catalog. Is this acceptable? I recommend raising this topic with the community. In my opinion, it would be better to encapsulate such Glue-specific behavior and avoid exposing it directly via the Flink Catalog interface. The catalog's behavior should remain consistent with other catalogs, with differences controlled via configuration only. 2. Please consider implementing the basic configuration options defined in [FLIP-277](https://cwiki.apache.org/confluence/display/FLINK/FLIP-277). If that’s not feasible in this PR, a fast follow-up would be valuable, especially for users relying on different credential modes. 3. It seems that some `.idea` folder files have been committed. Please remove them from the PR.Cheers, Samrat
Hello @Samrat002 @nicusX. Using Database Parameters and Table Parameters we are now able to use lower/upper case for defining the Database and Table Name. By using the Show Tables/Show Databases, Describe table commands, we will show the original Flink Definition, even though in Glue UI it will be all in lower case
Thanks for incorporating the changes. I will review the pr in next couple of days .
Can we refer to Paimon's solution for integrating Glue? https://github.com/apache/paimon/pull/4283/files#diff-534e1f558f31481abfd234e3b9c56c17a0988acf360a2264b0a090fa3a6aedd5
https://issues.apache.org/jira/browse/FLINK-38457
@FranMorilloAWS
@melin. To me knowledge Paimon integration with Glue Data Catalog is to store Paimon Tables in Glue, however it wouldnt work for storing Kinesis, MSK, and other streaming sources. They have different implementations.