alluxio Refine add mPendingPaths in InodeSyncStream new type

What changes are proposed in this pull request?

In the current implementation of InodeSyncStream, when synchronizing a large directory, because mPendingPaths is a queue, the entire synchronization process is similar to breadth-first traversal. The ufsStatusCache will be loaded a lot in advance, which will cause a lot of memory to be wasted, and even jvm will continue to start full gc because there is no memory.

Why are the changes needed?

I added a new option for mPendingPaths, which can make it a stack, the sync process will become a depth-first traversal, because the leaf nodes can be reached early, the ufsStatusCache can be recycled early, saving a lot of memory.

Does this PR introduce any user facing changes?

alluxio.master.metadata.sync.pending.path.type, can be QUEUE(default) or STACK

Sep 15 '22 06:09 adol001

Automated checks report:

Commits associated with Github account: PASS
PR title follows the conventions: FAIL
- The title of the PR does not pass all the checks. Please fix the following issues:
  - Title must not end with whitespace

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

Sep 15 '22 06:09 alluxio-bot

Automated checks report:

Commits associated with Github account: PASS
PR title follows the conventions: PASS

All checks passed!

Sep 15 '22 06:09 alluxio-bot

This pull request is very effective for synchronizing folders with more than 100 million items. For example, synchronizing a folder of /2021/days/items structure, the current implementation requires a large amount of memory to complete the folder with the above structure, and there is a lot of GC time, but the STACK structure is very fast and can be completed by occupying a small amount of memory

Sep 16 '22 04:09 adol001

LGTM with a minor comment on the property key. Good to merge after that is resolved. Thanks so much!

Fixed. Thank you for your review!

Sep 28 '22 11:09 adol001

@adol001 thanks for the fix. have you tested this change with a deep directory structure and compare the two options to see memory usage and performance difference? would be curious to see how they compare. This will allow us to provide guidance to users so they can set this configuration option correctly.

Sep 28 '22 12:09 yuzhu

@yuzhu

7 layers of directories, 10 subdirectories in each layer, 10 files in the deepest directory, a total of 11,111,111 items to be synchronized. Ufs is a local ssd disk. JAVA8.

      .setProperty(PropertyKey.MASTER_JOURNAL_CHECKPOINT_PERIOD_ENTRIES, 1000000000)
      .setProperty(PropertyKey.MASTER_METASTORE_INODE_CACHE_MAX_SIZE, 0)
      .setProperty(PropertyKey.MASTER_METADATA_SYNC_EXECUTOR_POOL_SIZE, 24)
      .setProperty(PropertyKey.MASTER_METADATA_SYNC_CONCURRENCY_LEVEL, 100)
      .setProperty(PropertyKey.MASTER_METADATA_SYNC_UFS_PREFETCH_POOL_SIZE, 32)

	time	jvm ou(jstat)	gct(jstat)
default	1450s	14g	122s
DFS + disable prefetch	1584s	3g	28s

For DFS + disable prefetch, from 7 layers to 8 layers(total of 111,111,111 items), the time consumption has not changed significantly.

	time	jvm ou(jstat)	gct(jstat)
7 layers	1584s	3g	28s
8 layers	15543s	8g	--

Sep 29 '22 07:09 adol001

@yuzhu not 1111111，the total num is 11111111。 Updated

Sep 29 '22 16:09 adol001

alluxio-bot, merge this please

Sep 30 '22 03:09 yuzhu

alluxio alluxio copied to clipboard

Refine add mPendingPaths in InodeSyncStream new type

What changes are proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user facing changes?

alluxio
alluxio copied to clipboard