Gluten shuffle data size is twice that of vanilla Spark shuffle data size, with celeborn as remote shuffe service
Description
vanilla spark
gluten
shuffle from aggregate after data union
Gluten version
None
DId you try spark.gluten.sql.columnar.shuffle.celeborn.useRssSort=false ?
DId you try
spark.gluten.sql.columnar.shuffle.celeborn.useRssSort=false?
our gluten version is 1.4, has not this conf yet
Could we try this case with version 1.5.0? It looks like there’s a fix for issue https://github.com/apache/incubator-gluten/issues/9163 — could you check if it works for you?
Could we try this case with version 1.5.0? It looks like there’s a fix for issue #9163 — could you check if it works for you?
anyway, we will upgrade gluten version later, and i will try this job in new version.
i will test with set spark.celeborn.client.spark.shuffle.writer to rss_sort first
i will test with set spark.celeborn.client.spark.shuffle.writer to rss_sort first
set spark.celeborn.client.spark.shuffle.writer to sort, shuffle data is small a lot, performance optimize 10%, but still slow than vanilla spark
It's observed by anohter customer. can you port the PR and test spark.gluten.sql.columnar.shuffle.celeborn.useRssSort=false
cc @kerwin-zk