[VL] Rss use row-based sort
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/apache/incubator-gluten/issues
Then could you also rename commit message and pull request title in the following format?
[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
See also:
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
@kerwin-zk Could you help to check whether this patch could work? Besides, I noticed it requires some extra configurations to be set for celeborn. Here's what I found and please help to check if there are anything else needed to fully enable sort-based shuffle with celeborn. Thanks!
--conf spark.shuffle.manager=celeborn \
--conf spark.celeborn.client.spark.shuffle.writer=sort \
--conf spark.celeborn.client.shuffle.compression.codec=zstd
cc: @FelixYBW
@kerwin-zk Could you help to check whether this patch could work? Besides, I noticed it requires some extra configurations to be set for celeborn. Here's what I found and please help to check if there are anything else needed to fully enable sort-based shuffle with celeborn. Thanks!
--conf spark.shuffle.manager=celeborn \ --conf spark.celeborn.client.spark.shuffle.writer=sort \ --conf spark.celeborn.client.shuffle.compression.codec=zstdcc: @FelixYBW
@marin-ma I'll try to test it first.
@marin-ma I'll try to test it first.
Is there any more config? I'm testing it in a customer case.
--conf spark.shuffle.manager=celeborn
--conf spark.celeborn.client.spark.shuffle.writer=sort
@FelixYBW @marin-ma I haven't tested row-based sort + Celeborn yet. For columnar-based sort + Celeborn, the following settings are needed:
spark.shuffle.manager: org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
spark.celeborn.master.endpoints: master-1-1:9097
spark.shuffle.service.enabled: false
spark.celeborn.client.spark.push.sort.memory.threshold: 128m
spark.celeborn.client.spark.shuffle.writer: sort
spark.celeborn.client.shuffle.compression.codec: none
spark.sql.adaptive.localShuffleReader.enabled: false
@floesing_pins you may take a look of this. You may set spark.gluten.sql.columnar.shuffle.sort.partitions.threshold=4000
@jingyuanzhang_pins FYI.
fiele isSort of VeloxCelebornColumnarShuffleWriter need update
Run Gluten Clickhouse CI on x86
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.
Run Gluten Clickhouse CI on x86
Run Gluten Clickhouse CI on x86
Run Gluten Clickhouse CI on x86
cc @kerwin-zk
@marin-ma Since our internal version depends on GLUTEN_RSS_SORT_SHUFFLE_WRITER, I suggest adding a configuration to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER.
@marin-ma Since our internal version depends on GLUTEN_RSS_SORT_SHUFFLE_WRITER, I suggest adding a configuration to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER.
Is your internal version the same as upstream? We picked this up because one customer met OOM issue using upstream. Did you try GLUTEN_SORT_SHUFFLE_WRITER? Is there performance gap?
@marin-ma Since our internal version depends on GLUTEN_RSS_SORT_SHUFFLE_WRITER, I suggest adding a configuration to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER.
Is your internal version the same as upstream? We picked this up because one customer met OOM issue using upstream. Did you try GLUTEN_SORT_SHUFFLE_WRITER? Is there performance gap?
@FelixYBW The internal version is somewhat different from the upstream version, and ShuffleRead may have memory issues(#9069). I previously tested GLUTEN_SORT_SHUFFLE_WRITER, and there were cases where it failed to process very large datasets. Therefore, I think it would be more appropriate to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER based on a parameter.
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.
@marin-ma Let's add a config to do the switch.
@kerwin-zk The OOM issue is fixed https://github.com/apache/incubator-gluten/pull/9221
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.