alluxio
alluxio copied to clipboard
Refine add mPendingPaths in InodeSyncStream new type
What changes are proposed in this pull request?
In the current implementation of InodeSyncStream, when synchronizing a large directory, because mPendingPaths is a queue, the entire synchronization process is similar to breadth-first traversal. The ufsStatusCache will be loaded a lot in advance, which will cause a lot of memory to be wasted, and even jvm will continue to start full gc because there is no memory.
Why are the changes needed?
I added a new option for mPendingPaths, which can make it a stack, the sync process will become a depth-first traversal, because the leaf nodes can be reached early, the ufsStatusCache can be recycled early, saving a lot of memory.
Does this PR introduce any user facing changes?
alluxio.master.metadata.sync.pending.path.type, can be QUEUE(default) or STACK
Automated checks report:
- Commits associated with Github account: PASS
- PR title follows the conventions: FAIL
- The title of the PR does not pass all the checks. Please fix the following issues:
- Title must not end with whitespace
- The title of the PR does not pass all the checks. Please fix the following issues:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.
Automated checks report:
- Commits associated with Github account: PASS
- PR title follows the conventions: PASS
All checks passed!
This pull request is very effective for synchronizing folders with more than 100 million items. For example, synchronizing a folder of /2021/days/items structure, the current implementation requires a large amount of memory to complete the folder with the above structure, and there is a lot of GC time, but the STACK structure is very fast and can be completed by occupying a small amount of memory
LGTM with a minor comment on the property key. Good to merge after that is resolved. Thanks so much!
Fixed. Thank you for your review!
@adol001 thanks for the fix. have you tested this change with a deep directory structure and compare the two options to see memory usage and performance difference? would be curious to see how they compare. This will allow us to provide guidance to users so they can set this configuration option correctly.
@yuzhu
7 layers of directories, 10 subdirectories in each layer, 10 files in the deepest directory, a total of 11,111,111 items to be synchronized. Ufs is a local ssd disk. JAVA8.
.setProperty(PropertyKey.MASTER_JOURNAL_CHECKPOINT_PERIOD_ENTRIES, 1000000000)
.setProperty(PropertyKey.MASTER_METASTORE_INODE_CACHE_MAX_SIZE, 0)
.setProperty(PropertyKey.MASTER_METADATA_SYNC_EXECUTOR_POOL_SIZE, 24)
.setProperty(PropertyKey.MASTER_METADATA_SYNC_CONCURRENCY_LEVEL, 100)
.setProperty(PropertyKey.MASTER_METADATA_SYNC_UFS_PREFETCH_POOL_SIZE, 32)
time | jvm ou(jstat) | gct(jstat) | |
---|---|---|---|
default | 1450s | 14g | 122s |
DFS + disable prefetch | 1584s | 3g | 28s |
For DFS + disable prefetch, from 7 layers to 8 layers(total of 111,111,111 items), the time consumption has not changed significantly.
time | jvm ou(jstat) | gct(jstat) | |
---|---|---|---|
7 layers | 1584s | 3g | 28s |
8 layers | 15543s | 8g | -- |
@yuzhu not 1111111,the total num is 11111111。 Updated
alluxio-bot, merge this please