FLINK-35641 ParquetSchemaConverter supports required fields
What is the purpose of the change
The purpose of this change is to fix 2 scenarios where Flink produces incorrect Parquet files.
Scenario 1: Convert nullable Flink types to optional Parquet types. Right now, Flink configures all Parquet types as optional, regardless of whether they are nullable or not. Scenario 2:.Ensures that the converter does not create invalid Parquet files with optional map keys. According to Parquet standard, map keys are required.
Brief change log
- Configure non-nullable Flink types as required Parquet types.
- Ensure that Flink map key type is non-nullable / required.
- Ensure that Flink multiset element type is non-nullable / required.
- Adjust existing tests and add new ones to cover the change in behavior.
- Mention the nullable key limitation in Parquet format docs.
Verifying this change
- Adjusted existing tests.
- Added new test cases to cover invalid types.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving): no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: yes? This change affects the behavior of the file system connector when if it writes Parquet files.
Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? Mentioned the unsupported nullable map keys in docs
CI report:
- 80b0bec76ee06bf8e48227d3a9f6c71ac0d3a8e6 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
Test failures are caused by this patch - https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=60364&view=logs&j=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819&t=2dd510a3-5041-5201-6dc3-54d310f68906. I will adjust the test cases on Thursday.