alluxio
alluxio copied to clipboard
Refine checkpoint by Parallel zip compression and decompression
What changes are proposed in this pull request?
Parallel zip compression and decompression can be used for RocksInodeStore.
Why are the changes needed?
Checkpoint is too slow for RocksIndoeStore with more than 1 billion data
Does this PR introduce any user facing changes?
It can be enabled by alluxio.master.parallel.backup.rocksdb, and the degree of parallelism can be determined by alluxio.master.parallel.backup.rocksdb.thread.pool.size
Automated checks report:
- Commits associated with Github account: PASS
- PR title follows the conventions: FAIL
- The title of the PR does not pass all the checks. Please fix the following issues:
- Title is too long (83 characters). Must be at most 72 characters.
- The title of the PR does not pass all the checks. Please fix the following issues:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.
Automated checks report:
- Commits associated with Github account: PASS
- PR title follows the conventions: PASS
All checks passed!
I do not have permission to add or remove reviewers in Alluxio repo. When requesting @jiacheliu3 re-review, @tcrain disappeared from the reviewer's list, but github shows that I removed tcrain from the reviewer. I don't know why this happens.
Can you also check if you need a check here for the journal tool, otherwise looks good, thanks: https://github.com/Alluxio/alluxio/blob/master/core/server/master/src/main/java/alluxio/master/journal/tool/AbstractJournalDumper.java#L88
https://github.com/Alluxio/alluxio/blob/master/core/server/master/src/main/java/alluxio/master/journal/tool/AbstractJournalDumper.java#L88
@adol001 See above i think that class needs to be updated to catch your new checkpoint type. If you feel it will take much code to handle that, I'm fine if you do it in a separate PR.
https://github.com/Alluxio/alluxio/blob/master/core/server/master/src/main/java/alluxio/master/journal/tool/AbstractJournalDumper.java#L88
@adol001 See above i think that class needs to be updated to catch your new checkpoint type. If you feel it will take much code to handle that, I'm fine if you do it in a separate PR.
@jiacheliu3 I will do it in a separate PR. Everything else has been fixed.
alluxio-bot, merge this please
@adol001 ,Hi, adol, when parrallel compress and decompress a large of inode info,can your provide the time consuming result comparing to sequence compress/decompress,thanks.
@liuyongqing 100 million files and nvme ssd
compress | decompress | |
---|---|---|
targz | 448521ms | 56096ms |
zip(thread 5, compress level 6) | 95603ms | 19019ms |
If you have a lot of cpu, you can increase the number of threads
@adol001 ,thanks for your kind answer,the result is very good which reduces Alluxio's unavailability time in case of failure,i will try to test it in my test environment.