alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Use concurrent map for BlockLockManager

Open dbw9580 opened this issue 2 years ago • 3 comments

What changes are proposed in this pull request?

Use IndexedSet which uses ConcurrentHashMap internally to replace the shared metadata lock to improve performance under heavy lock contention.

Why are the changes needed?

When lock contention is heavy, the sharedMetadataLock becomes a bottleneck of the throughput of lock acquisition and releasing, defeating the purpose of the individual block locks which are meant to be a fine-grained lock pool.

See also #16022 for details and motivation for this change.

Does this PR introduce any user facing changes?

No.

dbw9580 avatar Aug 22 '22 16:08 dbw9580

I'm not sure about if the implementation is sound after removing the shared lock...

https://github.com/Alluxio/alluxio/blob/940fb4642030ea99d481ea461546baed61ad555d/core/server/worker/src/main/java/alluxio/worker/block/BlockLockManager.java#L163-L171

The above check is outside of any synchronization section, so even if the sessionHoldsLock check returns true, before the current thread attempts to actually lock the lock, another thread could have acquired and locked the same lock in the meantime.

@beinan @jja725 can you please help take another closer look here?

dbw9580 avatar Sep 14 '22 12:09 dbw9580

I'm not sure about if the implementation is sound after removing the shared lock...

https://github.com/Alluxio/alluxio/blob/940fb4642030ea99d481ea461546baed61ad555d/core/server/worker/src/main/java/alluxio/worker/block/BlockLockManager.java#L163-L171

The above check is outside of any synchronization section, so even if the sessionHoldsLock check returns true, before the current thread attempts to actually lock the lock, another thread could have acquired and locked the same lock in the meantime.

@beinan @jja725 can you please help take another closer look here?

I just post a comment for this issue, using the commputeIfAbsent might help, but I'm not sure

beinan avatar Sep 15 '22 03:09 beinan

Fuse unit tests keep failing.

rror: 7.843 [ERROR] alluxio.fuse.AlluxioJniFuseFileSystemTest.chownWithoutValidGid  Time elapsed: 0.383 s  <<< FAILURE!
Wanted but not invoked:
fileSystem.setAttribute(
    /t/root/foo/bar,
    owner: "1001"

);
-> at alluxio.fuse.AlluxioJniFuseFileSystemTest.chownWithoutValidGid(AlluxioJniFuseFileSystemTest.java:160)

dbw9580 avatar Sep 22 '22 17:09 dbw9580

alluxio-bot, merge this please.

beinan avatar Sep 25 '22 00:09 beinan