alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Refine add mPendingPaths in InodeSyncStream new type

Open adol001 opened this issue 2 years ago • 3 comments

What changes are proposed in this pull request?

In the current implementation of InodeSyncStream, when synchronizing a large directory, because mPendingPaths is a queue, the entire synchronization process is similar to breadth-first traversal. The ufsStatusCache will be loaded a lot in advance, which will cause a lot of memory to be wasted, and even jvm will continue to start full gc because there is no memory.

Why are the changes needed?

I added a new option for mPendingPaths, which can make it a stack, the sync process will become a depth-first traversal, because the leaf nodes can be reached early, the ufsStatusCache can be recycled early, saving a lot of memory.

Does this PR introduce any user facing changes?

alluxio.master.metadata.sync.pending.path.type, can be QUEUE(default) or STACK

adol001 avatar Sep 15 '22 06:09 adol001

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • Title must not end with whitespace

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

alluxio-bot avatar Sep 15 '22 06:09 alluxio-bot

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

All checks passed!

alluxio-bot avatar Sep 15 '22 06:09 alluxio-bot

This pull request is very effective for synchronizing folders with more than 100 million items. For example, synchronizing a folder of /2021/days/items structure, the current implementation requires a large amount of memory to complete the folder with the above structure, and there is a lot of GC time, but the STACK structure is very fast and can be completed by occupying a small amount of memory

adol001 avatar Sep 16 '22 04:09 adol001

LGTM with a minor comment on the property key. Good to merge after that is resolved. Thanks so much!

Fixed. Thank you for your review!

adol001 avatar Sep 28 '22 11:09 adol001

@adol001 thanks for the fix. have you tested this change with a deep directory structure and compare the two options to see memory usage and performance difference? would be curious to see how they compare. This will allow us to provide guidance to users so they can set this configuration option correctly.

yuzhu avatar Sep 28 '22 12:09 yuzhu

@yuzhu

7 layers of directories, 10 subdirectories in each layer, 10 files in the deepest directory, a total of 11,111,111 items to be synchronized. Ufs is a local ssd disk. JAVA8.

      .setProperty(PropertyKey.MASTER_JOURNAL_CHECKPOINT_PERIOD_ENTRIES, 1000000000)
      .setProperty(PropertyKey.MASTER_METASTORE_INODE_CACHE_MAX_SIZE, 0)
      .setProperty(PropertyKey.MASTER_METADATA_SYNC_EXECUTOR_POOL_SIZE, 24)
      .setProperty(PropertyKey.MASTER_METADATA_SYNC_CONCURRENCY_LEVEL, 100)
      .setProperty(PropertyKey.MASTER_METADATA_SYNC_UFS_PREFETCH_POOL_SIZE, 32)
time jvm ou(jstat) gct(jstat)
default 1450s 14g 122s
DFS + disable prefetch 1584s 3g 28s

For DFS + disable prefetch, from 7 layers to 8 layers(total of 111,111,111 items), the time consumption has not changed significantly.

time jvm ou(jstat) gct(jstat)
7 layers 1584s 3g 28s
8 layers 15543s 8g --

adol001 avatar Sep 29 '22 07:09 adol001

@yuzhu not 1111111,the total num is 11111111。 Updated

adol001 avatar Sep 29 '22 16:09 adol001

alluxio-bot, merge this please

yuzhu avatar Sep 30 '22 03:09 yuzhu