Support case insensitive id assignment for applyNameMapping when reading parquet
We will encounter failures if we read the parquet data and the field names stored in the name mapping doesn't match the cases of the field names stored in parquet file schema.
When the mismatch happens, Iceberg will not assign the id to the case mismatch columns in the parquet file schema thus causing mismatching columns to be pruned later on.
This diff takes caseSensitive which is already presented in ReadConf into consideration, and passes it down into applyNameMapping to support case insensitive id assignment.
Thank you for your contributions!
I have a question regarding the overall solution for the issue addressed in this PR. Please forgive me if this is a silly one. I recall that the namemapping JSON object allows for specifying multiple names for a field. Would it be possible to solve the case sensitivity issue by specifying alternate field names in the namemapping? Are there any real-world scenarios that would prevent us from modifying the namemapping and necessitate the solution presented in this PR? I look forward to hearing your insights on this matter.
The issue we encountered happens when building iceberg table from hive table where hive table already has schema name case mismatch between its parquet file and table schema. While your solution should work, but overall I feel it adds more complexity given that case insensitivity support is already there for other places in iceberg.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.