alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

EdgeLockPool leak slowly

Open uniqueZt opened this issue 2 years ago • 8 comments

image

i find that slow leak occure in EdgeLockPool and i don't know what happened , i find no ERROR log in master.log!!!

uniqueZt avatar Sep 19 '22 12:09 uniqueZt

InodeLockPool is normal!!!

uniqueZt avatar Sep 19 '22 12:09 uniqueZt

@uniqueZt do you mind provide the Alluxio version and configurations of your cluster?

HelloHorizon avatar Sep 19 '22 18:09 HelloHorizon

version 2.7

uniqueZt avatar Sep 20 '22 05:09 uniqueZt

i have two cluster, use the same configuration, but one of them is normal and the other one ocurrs the problem!

uniqueZt avatar Sep 20 '22 05:09 uniqueZt

can you supply me some PR about this?let me analyze the problem?

uniqueZt avatar Sep 20 '22 05:09 uniqueZt

I only find the PR "https://github.com/Alluxio/alluxio/pull/14320" , but i cannot find any ERROR info in my master.log

uniqueZt avatar Sep 20 '22 05:09 uniqueZt

i try server operation,such as "create file, delete -R , create dir, list , getStatus , rename ", can't reproduction the problem ! i doubt that some exception may lead to the problem!

uniqueZt avatar Sep 20 '22 05:09 uniqueZt

image if occurs throwable, may lead lock leak.

uniqueZt avatar Sep 20 '22 09:09 uniqueZt

@uniqueZt

Hi I looked into the code you pasted a bit. If an error is thrown, the reference of the lock won't be held by the lock list and hence will be garbage collected later. No leak is caused by this try..catch.

Second, if these edge locks are really leaked, this means that these locks are in the InodeList and has been acquired by some threads. If a leak really happens -> locks are acquired by threads which never release them, then very likely you will see a dead lock happens, if the edge is read/written by other threads. According to your description, seems like the alluxio cluster works normally so far.

Third, according to the metric dashboard, I can see the # of edge locks are stable and grow very slowly. This makes me think if this is due to organic traffic growth instead of a leak issue. You can check the QPS to file system master to see if you see a similar growing cadence too.

This issue is very general and we are not able to do further investigation without more details revealed. Happy to help dig in if you can provide more details with us. Thanks!

elega avatar Sep 23 '22 09:09 elega

i find a bug, and can reproduce.

uniqueZt avatar Oct 10 '22 01:10 uniqueZt

image this can fix lock leak!!!

uniqueZt avatar Nov 14 '22 11:11 uniqueZt

code in lock pool

uniqueZt avatar Nov 14 '22 11:11 uniqueZt

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 05 '23 15:02 github-actions[bot]