incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[VL] Support file cache spill in Gluten

Open yma11 opened this issue 8 months ago • 8 comments

Description

Velox backend provides 2-level file cache (AsyncDataCache and SsdCache) and we have enabled it in PR, using a dedicated MMapAllocator initialized with configured capacity. This part of memory is not counted by execution memory or storage memory, and not managed by Spark UnifiedMemoryManager. In this ticket, we would like to fill this gap by following designs:

  • Add NativeStorageMemory segment in vanilla StorageMemory. We will have a configuration spark.memory.native.storageFraction to define its size. Then we use this size offheap.memory*spark.memory.storageFraction*spark.memory.native.storageFraction to initialize AsyncDataCache.
  • Add configuration spark.memory.storage.preferSpillNative to determine preference of spilling RDD cache or FileCache(Native) when storage memory should be shrinked. For example, when queries are mostly executed on same data sources, we prefer to keep native file cache.
  • Introduce NativeMemoryStore to provide similar interfaces as vanilla MemoryStore and call AsyncDataCache::shrink when eviction needed.
  • Introduce NativeStorageMemoryAllocator which is a memory allocator used for creating AsyncDataCache. It's wrapped with a ReservationListener to track the memory usage in native cache.
  • VeloxBackend initialization will be done w/o cache created. We will do VeloxBackend::setAsyncDatacache when memory pools initializing.

The key code path will like following: image

yma11 avatar May 27 '24 13:05 yma11