[VL] Use conf to control C2R occupied memory
What changes were proposed in this pull request?
In the current design, the Column2Row operation is completed in one go, which consumes a lot of memory and causes the program OOM. In this commit, I modified the C2R operation into multiple operations, which will greatly reduce the peak memory of the C2R operation. In addition, there should be some performance advantages from memory reuse.
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/apache/incubator-gluten/issues
Then could you also rename commit message and pull request title in the following format?
[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
See also:
Run Gluten Clickhouse CI
@XinShuoWang Can you update?
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
@XinShuoWang can you fix the UT?
Run Gluten Clickhouse CI
@XinShuoWang The UT fails as the default buffer size is 64MB and it's too large when task memory is small. You may can change the threshold value for these UTs.
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
@XinShuoWang can you fix the UT? the bug triggered many queries failed in our test.
Run Gluten Clickhouse CI
@XinShuoWang Would you continue to fix the issue? If not we can pickup and fix it. The bug blocked our adoption
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Run Gluten Clickhouse CI
Looks like the generator operator already limit the output rowvector size to the one we set.
The OOM isn't caused by C2R, but the R2C after parquet scan, because the struct column is too large. The way to solve OOM is to decrease the batch.
Another operator which may output large batch size is hashjoin. So maybe the PR is still useful.
So maybe the PR is still useful.
I am thinking about using the similar mechanism with the one added in https://github.com/apache/incubator-gluten/pull/6009 as a general solution. So we don't have to add such kind of pre-processors for operators one by one @FelixYBW
So maybe the PR is still useful.
I am thinking about using the similar mechanism with the one added in #6009 as a general solution. So we don't have to add such kind of pre-processors for operators one by one @FelixYBW
And the operator may need to be added in Velox to fit into Velox's task processing. We don't want to break whole-stage transformer by the operator in query plan.
#6009
It should be more efficient to break or combine batches in the next operator, instead of implement a dedicated one. Dedicated operator must have a memcpy.
ay need to be added in Velox to fit into Velox's task processing. We don't want to break whole-stage transformer by the operator in query plan.
Velox should hornor the max batch size always, or limit the batch by memory size.
ow operation is completed in one go, which consumes a lot of memory and causes the program OOM. In this commit, I modified the C2
@XinShuoWang what's your case to have large batch size? My issue can be fixed by reduce spark.gluten.sql.columnar.maxBatchSize and spark.sql.parquet.columnarReaderBatchSize
It's caused by a bug when calculating row size.