spark-excel
spark-excel copied to clipboard
Driver is up but is not responsive, likely due to GC!!! when writing dataframe to a big excel file into the blob storage
Your issue may already be reported!
I try to write spark dataframe to excel file on blob storage.
df.repartition(1).write.format("com.crealytics.spark.excel") .mode("overwrite") .option("header", "true") .option("maxRowsInMemory", 1000) .save("/mnt/IngestExelFiles/output_fulldf.xlsx")
when the data frame has more than 200,000 rows, I see the Driver is up but is not responsive, likely due to GC (databricks)
environment: 8.4 (includes Apache Spark 3.1.2, Scala 2.12) Driver type: 56 GB Memory, 8 cores
I could read the big excel file from blob storage, but writing the same table doesn't work!!!
Is there any clue?
Thanks
Hi @sabrishami How about helping us prepare the df (generating is fine) so we can reproduce the issue on our side? I don't have ready access to databrick, however, I can run it on a local machine and observe the resource usage?
should be fixed by 0.18.0 - the excel v2 data source now has support for maxRowsInMemory setting and this lowers the memory overheas
@sabrishami can you check if 0.18.0 with .format("excel").option("maxRowsInMemory")
fixes the issue?
I'd close the issue for now, should it not work please post a comment and I'll reopen.