spark [SPARK-38909][CORE][YARN] Encapsulate `LevelDB` used to store remote/external shuffle state as `DB`

What changes were proposed in this pull request?

ExternalShuffleBlockResolver, YarnShuffleService and RemoteBlockPushResolver use LevelDB directly, this is not conducive to extending the use of RocksDB in this scenario. This pr is encapsulated for expansibility. It will be the pre-work of SPARK-38888, the main change as follows:

Introduce DB interface and implementation of corresponding LevelDB now, RocksDB implementation will be involved in the future
Introduce DBBackend enum and new config SHUFFLE_SERVICE_DB_BACKEND to specify a disk-based store used in shuffle service local db, only LEVELDB is supported now, RocksDB implementation will be involved in the future
Introduce DBIterator to traverse the items in the DB, only LevelDBIterator is supported now, RocksDBIterator implementation will be involved in the future
Introduce DBProvider to initialization DB, it is only used to generate LevelDB now, RocksDB implementation will be involved in the future
Change StoreVersion to top-level class, it will be reused by RocksDB implementation
Replace the directly use of LevelDB in ExternalShuffleBlockResolver, YarnShuffleService and RemoteBlockPushResolver with DB
fix corresponding UTs and test tools

Why are the changes needed?

This is pre work of SPARK-38888

Does this PR introduce any user-facing change?

Add new config SHUFFLE_SERVICE_DB_BACKEND(spark.shuffle.service.db.backend), the new config use to use to specify a disk-based store used in shuffle service local db. Only LEVELDB is supported after this pr.

How was this patch tested?

Pass GA

Apr 14 '22 13:04 LuciferYang

Although I couldn't take a look at this, you can ask YARN committers' help if you want to make this proceed, @LuciferYang .

Aug 02 '22 00:08 dongjoon-hyun

thanks @dongjoon-hyun

Aug 02 '22 02:08 LuciferYang

I found RemoteBlockPushResolver begin to use LevelDB, more refactor work is required before reopen this pr

Aug 02 '22 06:08 LuciferYang

Got it. That's bad.

Aug 02 '22 06:08 dongjoon-hyun

This is a recent change, to add missing functionality for preserving merge data across restarts. Please tag me when you have an updated PR @LuciferYang, will help review it.

Aug 02 '22 06:08 mridulm

Merge with master and re-trigger GA

Aug 04 '22 03:08 LuciferYang

This is a recent change, to add missing functionality for preserving merge data across restarts. Please tag me when you have an updated PR @LuciferYang, will help review it.

@dongjoon-hyun @mridulm Conflict is resolved and GA passed. But I think we should make sure that SPARK-38888 is worth to do before reviewing, because this pr is the pre-work of SPARK-38888.

Aug 04 '22 12:08 LuciferYang

@dongjoon-hyun @mridulm Conflict is resolved and GA passed. But I think we should make sure that SPARK-38888 is worth to do before reviewing, because this pr is the pre-work of SPARK-38888.

I agree with @dongjoon-hyun's assessment in the jira, and I dont have strong opinion. If rest of the codebase is moving towards an abstraction for db, I would prefer we did the same for yarn shuffle as well though from a consistency point of view.

Aug 04 '22 19:08 mridulm

Actually, instead of providing wrapper layer, the final goal is the removal of LevelDB in favor of RocksDB.

Aug 04 '22 20:08 dongjoon-hyun

@dongjoon-hyun

Actually, instead of providing wrapper layer, the final goal is the removal of LevelDB in favor of RocksDB.

I am sure in future, we might have another SPIP to remove RocksDB in favor of something else :-) If we can abstract it away, the future evolution might be easier !

Aug 09 '22 04:08 mridulm

nit: Btw, from a completeness point of view, DB.get makes sense to include :-)

Aug 09 '22 05:08 mridulm

nit: Btw, from a completeness point of view, DB.get makes sense to include :-)

OK ~

Aug 09 '22 05:08 LuciferYang

Also cc @tgravescs do you have time to help review this pr? Thanks ~

Aug 15 '22 04:08 LuciferYang

changes look fine to me

Aug 18 '22 14:08 tgravescs

Merged to master. Thanks for working on this @LuciferYang ! Thanks for reviews @dongjoon-hyun, @zhouyejoe and @tgravescs :-)

Aug 18 '22 15:08 mridulm

Thanks @mridulm @tgravescs @zhouyejoe @dongjoon-hyun !!! I will do the RocksDB related work next week due to some other work needs to be completed this week ~

Aug 18 '22 15:08 LuciferYang

spark spark copied to clipboard

[SPARK-38909][CORE][YARN] Encapsulate `LevelDB` used to store remote/external shuffle state as `DB`

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

spark
spark copied to clipboard