velox icon indicating copy to clipboard operation
velox copied to clipboard

Erase the previous processed WindowPartition in RowStreamingWindowBuild

Open JkSelf opened this issue 1 year ago • 7 comments

In the core case, if each row corresponds to a separate WindowPartition, this could result in an excessive number of WindowPartition instances being held in memory, potentially leading to OOM . To fix this issue, this PR proposes replacing the std::vector with a deque data structure, which will allow for more efficient release of WindowPartition resources.

JkSelf avatar Sep 24 '24 02:09 JkSelf

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 552df8cb888afaf4ee2b3fb6a4785da1cebdb5e7
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67bd1cf4c341f80008f80ffd

netlify[bot] avatar Sep 24 '24 02:09 netlify[bot]

In Velox we track the memory by plan node memory pool or global spill memory pool. In gluten the memory pool for plan node is counted into offheap size, the global spill memory pool is counted into overhead. But there are much memory allocation in Velox which isn't tracked. The common issue is to allocate a potential per row std::vector which can be very large. Because each partition can have like 1G row, then even a std::vector<rowVector*> can take 8G memory.

The solution is 1) try best to avoid the per row std::vector 2) if we have to use the per row std::vector, define it as std::vector<X, memory::StlAllocator<char*>>.

To this issue, it's common case that each partition has one row, so eventually it's a per row std::vector.

FelixYBW avatar Sep 24 '24 19:09 FelixYBW

In Velox we track the memory by plan node memory pool or global spill memory pool. In gluten the memory pool for plan node is counted into offheap size, the global spill memory pool is counted into overhead. But there are much memory allocation in Velox which isn't tracked. The common issue is to allocate a potential per row std::vector which can be very large. Because each partition can have like 1G row, then even a std::vector<rowVector*> can take 8G memory.

The solution is 1) try best to avoid the per row std::vector 2) if we have to use the per row std::vector, define it as std::vector<X, memory::StlAllocator<char*>>.

To this issue, it's common case that each partition has one row, so eventually it's a per row std::vector.

Thanks @FelixYBW. So you are saying the memory consumption comes from https://github.com/facebookincubator/velox/blob/main/velox/exec/WindowPartition.h#L239.

The fix is fine imo.

But maybe we should also define rows_ with StlAllocator.

aditi-pandit avatar Sep 25 '24 06:09 aditi-pandit

In Velox we track the memory by plan node memory pool or global spill memory pool. In gluten the memory pool for plan node is counted into offheap size, the global spill memory pool is counted into overhead. But there are much memory allocation in Velox which isn't tracked. The common issue is to allocate a potential per row std::vector which can be very large. Because each partition can have like 1G row, then even a std::vector<rowVector*> can take 8G memory. The solution is 1) try best to avoid the per row std::vector 2) if we have to use the per row std::vector, define it as std::vector<X, memory::StlAllocator<char*>>. To this issue, it's common case that each partition has one row, so eventually it's a per row std::vector.

Thanks @FelixYBW. So you are saying the memory consumption comes from https://github.com/facebookincubator/velox/blob/main/velox/exec/WindowPartition.h#L239.

The fix is fine imo.

But maybe we should also define rows_ with StlAllocator.

@aditi-pandit I will file another pr to add the StlAllocator after this pr merged. Thanks.

JkSelf avatar Sep 25 '24 06:09 JkSelf

@xiaoxmeng Can you help to merge this PR? Thanks.

JkSelf avatar Sep 26 '24 07:09 JkSelf

@xiaoxmeng Can you help to merge this PR? Thanks.

Hold on please. I haven't verified.

FelixYBW avatar Sep 26 '24 18:09 FelixYBW

@aditi-pandit can you take a look again? It's an important fix for common OOM issue from customer side.

@xiaoxmeng can you review?

FelixYBW avatar Oct 25 '24 07:10 FelixYBW

@JkSelf : Can we have a test for this code specifically ? Would be great if you can add something.

aditi-pandit avatar Oct 27 '24 22:10 aditi-pandit

@JkSelf : Can we have a test for this code specifically ? Would be great if you can add something.

@aditi-pandit The existing unit tests here can verify this changes.

JkSelf avatar Oct 29 '24 06:10 JkSelf

@JkSelf : Can we have a test for this code specifically ? Would be great if you can add something.

@aditi-pandit The existing unit tests here can verify this changes.

@JkSelf : Agree. But I would've expected this situation to be caught by a test like https://github.com/facebookincubator/velox/blob/main/velox/exec/tests/WindowTest.cpp#L154

It would be great to write a test with the previously failing input data/window so that we are assured that this code is working.

aditi-pandit avatar Oct 31 '24 06:10 aditi-pandit

@aditi-pandit @FelixYBW Resolved all your comments. Can you help to review again? Thanks. I updated the pool capacity from 8MB to 4MB here and the oom will occur without this PR.

JkSelf avatar Nov 19 '24 07:11 JkSelf

@aditi-pandit @FelixYBW Resolved all your comments. Can you help to review again? Thanks. I updated the pool capacity from 8MB to 4MB here and the oom will occur without this PR.

Thank you! Ke. Did you also add test to have a large window partition? It's used to test that input and output partition are the same one.

FelixYBW avatar Nov 19 '24 21:11 FelixYBW

@aditi-pandit @FelixYBW Resolved all your comments. Can you help to review again? Thanks. I updated the pool capacity from 8MB to 4MB here and the oom will occur without this PR.

Thank you! Ke. Did you also add test to have a large window partition? It's used to test that input and output partition are the same one.

@FelixYBW I added the test in https://github.com/facebookincubator/velox/pull/11077/files#diff-9f41a5e1a8da742fc53f54ba85588678cdb81ffb05b38eefebe21ad5138def8eR221 to evaluate huge window partitions. Based on previous experience https://github.com/facebookincubator/velox/pull/10883, this test may cause issues with the meta CI when dealing with large window partitions. My suggestion is to use the existing rowBasedStreamingWindowOOM test and reduce the memory capacity from 8MB to 4MB.

JkSelf avatar Nov 25 '24 05:11 JkSelf

@aditi-pandit Any more comments?

FelixYBW avatar Dec 03 '24 23:12 FelixYBW

The current PR can't fix the window OOM issue any more. Let me check.

FelixYBW avatar Dec 06 '24 09:12 FelixYBW

The current PR can't fix the window OOM issue any more. Let me check.

I have a test case where running on window results in an OOM error. After applying this PR, the OOM issue no longer occurs.

liujiayi771 avatar Feb 19 '25 08:02 liujiayi771

@xiaoxmeng @aditi-pandit @FelixYBW I attempted to increase the number of rows while reducing the memory capacity. However, this approach failed when creating the RowVector with lower memory capacity during unit testing. As a result, it is challenging to include an OOM case in the unit tests. We have submitted PR in Gluten to test the unit tests, and @liujiayi771 has also helped verify that this PR can resolve the window OOM issue in their environment. Given these circumstances, could we proceed with merging this PR without the unit tests?

JkSelf avatar Feb 20 '25 03:02 JkSelf

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Feb 21 '25 01:02 facebook-github-bot

@xiaoxmeng Resolved all your comments. Please help to review again. Thanks.

JkSelf avatar Feb 24 '25 05:02 JkSelf

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Feb 24 '25 05:02 facebook-github-bot

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Feb 25 '25 02:02 facebook-github-bot

@xiaoxmeng merged this pull request in facebookincubator/velox@84c78e2846fb5ed73a7476c9eb533849a0118d54.

facebook-github-bot avatar Feb 25 '25 06:02 facebook-github-bot