spark [SPARK-39997][SQL] Fix ParquetSchemaConverter fails match schema by id

[SPARK-39997][SQL] Fix ParquetSchemaConverter fails match schema by id

Open zinking opened this issue 2 years ago • 3 comments

currently, match parquet schema by id fails under certain case

new unit test added

What changes were proposed in this pull request?

in this PR, fixed cases where ParquetSchemaConverter fails match schema by id when converting parquet schema, SparkType is preferred instead of Converted parquet type, chances are SparkType has a name which will then fail the same type validation in later cases when running vectorized nested column read.

Why are the changes needed?

when converting parquet schema, SparkType is preferred instead of Converted parquet type, chances are SparkType has a name which will then fail the same type validation in later cases when running vectorized nested column read.

Does this PR introduce any user-facing change?

How was this patch tested?

added new ut

Aug 06 '22 13:08 zinking

Can one of the admins verify this patch?

Aug 06 '22 16:08 AmplabJenkins

cc @sunchao

Aug 07 '22 07:08 dongjoon-hyun

sunchao commented the spark type is intended for maintaining type precision, but here the spark type is carrying the name c1 which is renamed from c0 so carrying it forward to later phases will cause same type validation errors.

Aug 07 '22 10:08 zinking

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Dec 08 '22 00:12 github-actions[bot]

spark spark copied to clipboard

[SPARK-39997][SQL] Fix ParquetSchemaConverter fails match schema by id

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

spark
spark copied to clipboard