spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

[BUG] Schema is not getting merged while reading multiple files with different schema

Open Manasa81 opened this issue 2 years ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

I am trying to read multiple files inside a folder. While printing the schema dataframe is only giving first excel file schema. with csv's mergeSchema option will merge all files schema. with 'excel' do we have any option to merge multiple files schema.

image This is how I am reading a folder.

Expected Behavior

Schema of files inside a folder should me merged

Steps To Reproduce

Keep two files with different columns inside a folder and try reading it

Environment

- Spark version:3.3.0
- Spark-Excel version:com.crealytics:spark-excel_2.12:0.14.0
- OS:windows
- Cluster environment:11.3LTS

Anything else?

No response

Manasa81 avatar May 23 '23 11:05 Manasa81

Does that combination of Spark and spark-excel even work?? Please always try the newest version of spark-excel when posting issues. I don't think this solves the problem in this case, but makes the life of the maintainers much easier when we know that an issue is present in the newest version.

nightscape avatar May 23 '23 15:05 nightscape

I see the same issue for below versions. inferschema is true, 2 date partitions, one of them has an extra column and the column is not showing up in schema @nightscape

spark-excel_2.12_0.18.5
spark 3.2.2

gaya3dk2490 avatar Jul 20 '23 07:07 gaya3dk2490

Same results with

spark-excel_2.13_0.18.7
spark 3.3.1

gaya3dk2490 avatar Jul 20 '23 08:07 gaya3dk2490