spark-excel icon indicating copy to clipboard operation
spark-excel copied to clipboard

Writing a large Dataset into an Excel file causes `java.lang.OutOfMemoryError: GC overhead limit exceeded`

Open kansuke231 opened this issue 6 years ago • 5 comments

Hi All,

I am trying to write a relatively large Spark Dataset (> 50000 rows) into an Excel file using spark-excel as follows:

DataFrameWriter<Row> writer = dataset.write()
                .format("com.crealytics.spark.excel")
                .option("sheetName", sheets.get(0)) // Only first sheet
                .option("useHeader", useHeader)
                .mode("overwrite");

This works fine if the Dataset contains around 20000 rows, but from approximately 35000 rows, the code yields java.lang.OutOfMemoryError: GC overhead limit exceeded. It seems that the easiest temporary solution would be bumping up the allocated memory for JVM, but I would like to avoid that since bigger Dataset may come in and cause the same issue. Any thoughts on this? Thanks.

kansuke231 avatar Sep 06 '18 17:09 kansuke231

Currently we're using Spoiwo to write Excel files. There is an open issue to use SXSSF but that would require major changes. So the options would be to create a PR in Spoiwo or replace Spoiwo itself with something else that is capable of streaming. I don't have time to do this myself, but I'd try to be helpful if you want to give it a try.

nightscape avatar Sep 06 '18 20:09 nightscape

Thanks for the comment @nightscape.

We decided that implementing the simplest version of Excel writer with SXSSF ourselves is the best option due to the time constraint. So I am going to close this issue for now.

kansuke231 avatar Sep 06 '18 21:09 kansuke231

Hello, I am facing the same issue. How can I do that?. I mean, use the SXSSF. We are trying to write 500000 rows into xlsx and it is impossible. Thanks in advance

AntonioNieto92 avatar Mar 24 '21 17:03 AntonioNieto92

It seems that Spoiwo integrated the SXSSF writing. Unfortunately, I don't have time to look into how to actually use it, but if someone wants to create a PR for it, I'd be happy to review.

nightscape avatar Mar 30 '21 07:03 nightscape

Hi @AntonioNieto92 and @kansuke231 If is it possible, please help share your excel file (after removing sensitive data)? Or steps to generate (column set, number of rows) an excel file that can reproduce this issue? Sincerely,

quanghgx avatar Aug 24 '21 15:08 quanghgx