spark-excel Data is getting changed when a column have multiple datatypes

Data is getting changed when a column have multiple datatypes

Open shanmukha-albanero opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

I am trying to read data from a column which have multiple datatypes, both integer, and decimal. The decimal values are rounded off to some digits only. I am passing inferSchema as false (Eg: 284.259235897532 to 284.2592359 ), If I pass inferSchema as true, integers are being converted into decimals.

df: DataFrame = ( self.spark.read.format("com.crealytics.spark.excel") .option("dataAddress", data_address) .option("header", "true") .option("inferSchema", "false") #! important .option("usePlainNumberFormat", "false") #! important .option("maxRowsInMemory", "50") .load(f"s3a://{self.bucket}/{self.excel_file}") )

Expected Behavior

When we read the data using spark excel, data should not change even though column have multiple datatypes

Steps To Reproduce

No response

Environment

- Spark version:2.4.7
- Spark-Excel version:2.11
- OS: ubuntu
- Cluster environment

Anything else?

No response

Mar 16 '23 09:03 shanmukha-albanero

We're using the getNumericCellValue method to read data from a cell using POI. Unfortunately, I'm not aware of any other way to read numeric values into a higher-precision format...

Mar 16 '23 23:03 nightscape

spark-excel spark-excel copied to clipboard

Data is getting changed when a column have multiple datatypes

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

spark-excel
spark-excel copied to clipboard