spark
spark copied to clipboard
[SPARK-39997][SQL] Fix ParquetSchemaConverter fails match schema by id
currently, match parquet schema by id fails under certain case
no
new unit test added
What changes were proposed in this pull request?
in this PR, fixed cases where ParquetSchemaConverter fails match schema by id when converting parquet schema, SparkType is preferred instead of Converted parquet type, chances are SparkType has a name which will then fail the same type validation in later cases when running vectorized nested column read.
Why are the changes needed?
when converting parquet schema, SparkType is preferred instead of Converted parquet type, chances are SparkType has a name which will then fail the same type validation in later cases when running vectorized nested column read.
Does this PR introduce any user-facing change?
no
How was this patch tested?
added new ut
Can one of the admins verify this patch?
cc @sunchao
sunchao commented the spark type is intended for maintaining type precision, but here the spark type is carrying the name c1
which is renamed from c0
so carrying it forward to later phases will cause same type validation errors.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!