v6d
v6d copied to clipboard
Implements spill and checkpoint functionalities in Vineyard
Describe your problem
Vineyard is in-memory data manager for bigdata computation workflows. Vineyard shares distributed datasets (e.g., tensors, dataframes, graphs) across many machines to enable zero-copy data sharing between distributed compute engines. In some read-world cases, the data may sometimes exceed the available memory size, and can be swapped to disks (or remote storage like oss or s3) to temporarily release and memory for other jobs and swapped it back to memory again when been required. It is the so-called "spill" process.
In this task the candidate is responsible for implementing such a spill functionality in Vineyard, with a relative reasonable and smart policy (co-designed with the mentor) that can select proper objects to spill, e.g., it is unsane to spill an object that is current in use. Based the support for spill and reload, we can archive the checkpoint functionality which could temporarily dump the whole data in vineyard and reload it back some time later. The checkpoint functionality is the basis for fault-tolerance.
SubTasks
- [x] A warmup task to get familiar with blob store:
- [x] #739
- [x] Implements the support for spill in vineyard, local disks, and remote object storage must be supported via a unified and extensible interface.
- [x] #740
- [ ] Implements the support for checkpoint in vineyard by reusing some functionalities in spill.
- [ ] #243
Additional context
This issue is part of our OSPP 2022
PR #815 fixes #740.
Closing as the spill functionality has been implemented in recent vineyard releases.
Thanks a lot for your effort! @ZjuYTW