incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[#2061]fix(server): Fix netty memory leak when removeBuffer and cacheShuffle…

Open lwllvyb opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

Fix netty memory leak when removeBuffer and cacheShuffleData happen concurrent

Why are the changes needed?

(Please clarify why the changes are needed. For instance,

  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, describe the bug.)

Fix: https://github.com/apache/incubator-uniffle/issues/2061

Does this PR introduce any user-facing change?

(Please list the user-facing changes introduced by your change, including

  1. Change in user-facing APIs.
  2. Addition or removal of property keys.)

No.

How was this patch tested?

(Please test your changes, and provide instructions on how to test it:

  1. If you add a feature or fix a bug, add a test to cover your changes.
  2. If you fix a flaky test, repeat it for many times to prove it works.)

lwllvyb avatar Aug 19 '24 07:08 lwllvyb

Test Results

 2 895 files   - 31   2 895 suites   - 31   5h 48m 30s :stopwatch: - 14m 54s  1 041 tests ± 0   1 038 :white_check_mark:  -  1   2 :zzz: ±0  1 :x: +1  12 943 runs   - 60  12 912 :white_check_mark:  - 61  30 :zzz: ±0  1 :x: +1 

For more details on these failures, see this check.

Results for commit 89267227. ± Comparison against base commit 78fe934e.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Aug 19 '24 08:08 github-actions[bot]

Could you help describe more why this(remove/cache buffer) will happen at the same time? When i test the configuration "rss.server.huge-partition.size.hard.limit", the test failed because the partition was oversized. (in chronological order)

  1. In function ShuffleBufferManager@cacheShuffleData, getShuffleBufferEntry be called and return success.
  2. Function removeResources be triggered and appId be removed from bufferPool.
  3. In function ShuffleBufferManager@cacheShuffleData, buffer.append be called and the ByteBuf be added to the buffer. This ByteBuf will never be released.

lwllvyb avatar Aug 26 '24 04:08 lwllvyb

image The failed test case is not related to the PR.

lwllvyb avatar Oct 10 '24 08:10 lwllvyb

Ping @jerqi @rickyma @zhengchenyu Could you please take some time to review this PR? If it doesn't need to be merged, please let me know.

lwllvyb avatar Oct 15 '24 03:10 lwllvyb

BTW, If I missed somthing, feel free to ping me @lwllvyb or by wechat

zuston avatar Oct 15 '24 03:10 zuston