alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Refine checkpoint by Parallel zip compression and decompression

Open adol001 opened this issue 2 years ago • 3 comments

What changes are proposed in this pull request?

Parallel zip compression and decompression can be used for RocksInodeStore.

Why are the changes needed?

Checkpoint is too slow for RocksIndoeStore with more than 1 billion data

Does this PR introduce any user facing changes?

It can be enabled by alluxio.master.parallel.backup.rocksdb, and the degree of parallelism can be determined by alluxio.master.parallel.backup.rocksdb.thread.pool.size

adol001 avatar Sep 05 '22 07:09 adol001

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • Title is too long (83 characters). Must be at most 72 characters.

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

alluxio-bot avatar Sep 05 '22 07:09 alluxio-bot

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

All checks passed!

alluxio-bot avatar Sep 05 '22 07:09 alluxio-bot

I do not have permission to add or remove reviewers in Alluxio repo. When requesting @jiacheliu3 re-review, @tcrain disappeared from the reviewer's list, but github shows that I removed tcrain from the reviewer. I don't know why this happens.

adol001 avatar Sep 13 '22 09:09 adol001

Can you also check if you need a check here for the journal tool, otherwise looks good, thanks: https://github.com/Alluxio/alluxio/blob/master/core/server/master/src/main/java/alluxio/master/journal/tool/AbstractJournalDumper.java#L88

tcrain avatar Oct 03 '22 16:10 tcrain

https://github.com/Alluxio/alluxio/blob/master/core/server/master/src/main/java/alluxio/master/journal/tool/AbstractJournalDumper.java#L88

@adol001 See above i think that class needs to be updated to catch your new checkpoint type. If you feel it will take much code to handle that, I'm fine if you do it in a separate PR.

jiacheliu3 avatar Oct 04 '22 03:10 jiacheliu3

https://github.com/Alluxio/alluxio/blob/master/core/server/master/src/main/java/alluxio/master/journal/tool/AbstractJournalDumper.java#L88

@adol001 See above i think that class needs to be updated to catch your new checkpoint type. If you feel it will take much code to handle that, I'm fine if you do it in a separate PR.

@jiacheliu3 I will do it in a separate PR. Everything else has been fixed.

adol001 avatar Oct 06 '22 06:10 adol001

alluxio-bot, merge this please

jiacheliu3 avatar Oct 07 '22 02:10 jiacheliu3

@adol001 ,Hi, adol, when parrallel compress and decompress a large of inode info,can your provide the time consuming result comparing to sequence compress/decompress,thanks.

liuyongqing avatar Nov 06 '22 10:11 liuyongqing

@liuyongqing 100 million files and nvme ssd

compress decompress
targz 448521ms 56096ms
zip(thread 5, compress level 6) 95603ms 19019ms

If you have a lot of cpu, you can increase the number of threads

adol001 avatar Nov 07 '22 02:11 adol001

@adol001 ,thanks for your kind answer,the result is very good which reduces Alluxio's unavailability time in case of failure,i will try to test it in my test environment.

liuyongqing avatar Nov 07 '22 08:11 liuyongqing