v6d icon indicating copy to clipboard operation
v6d copied to clipboard

Implements spill and checkpoint functionalities in Vineyard

Open sighingnow opened this issue 2 years ago • 1 comments

Describe your problem

Vineyard is in-memory data manager for bigdata computation workflows. Vineyard shares distributed datasets (e.g., tensors, dataframes, graphs) across many machines to enable zero-copy data sharing between distributed compute engines. In some read-world cases, the data may sometimes exceed the available memory size, and can be swapped to disks (or remote storage like oss or s3) to temporarily release and memory for other jobs and swapped it back to memory again when been required. It is the so-called "spill" process.

In this task the candidate is responsible for implementing such a spill functionality in Vineyard, with a relative reasonable and smart policy (co-designed with the mentor) that can select proper objects to spill, e.g., it is unsane to spill an object that is current in use. Based the support for spill and reload, we can archive the checkpoint functionality which could temporarily dump the whole data in vineyard and reload it back some time later. The checkpoint functionality is the basis for fault-tolerance.

SubTasks

  • [x] A warmup task to get familiar with blob store:
    • [x] #739
  • [x] Implements the support for spill in vineyard, local disks, and remote object storage must be supported via a unified and extensible interface.
    • [x] #740
  • [ ] Implements the support for checkpoint in vineyard by reusing some functionalities in spill.
    • [ ] #243

Additional context

This issue is part of our OSPP 2022

sighingnow avatar May 10 '22 01:05 sighingnow

PR #815 fixes #740.

mengke-mk avatar Jul 25 '22 07:07 mengke-mk

Closing as the spill functionality has been implemented in recent vineyard releases.

Thanks a lot for your effort! @ZjuYTW

sighingnow avatar Sep 14 '22 07:09 sighingnow