dagstore
dagstore copied to clipboard
Automated Watermark based GC and Transient Quota allocation
This is a meta-issue to track the work of introducing an automated watermark based LRU GC of transients along with a quota reservation mechanism to allow for downloading transients whose size we do not know upfront.
The work is spread across multiple PRs.
High level overview
-
The dagstore now performs automated high->low watermark based GC for transient files.
-
Users who want to use this feature will have to configure a maximum size for the transients directory and the dagstore guarantees that the size of the transients directory will never exceed that limit.
-
Users will also have to configure a high and low watermark for the transients directory. The dagstore will kickstart an automated GC when it detects that the size of the transients directory has crossed the high watermark and will attempt to bring down the directory size below the low watermark threshold.
-
Users will have to configure a GC Strategy that will recommend the order in which reclaimable shards should be GC'd by the automated GC mechanism. The dagstore comes inbuilt with an LRU GC Strategy but users are free to implement their own. See the documentation of
GarbageCollectionStrategyfor more details. -
A quota reservation mechanism has been introduced for downloading transients whose size we do not know upfront. To download such a CAR, the downloader will first get a reservation from the dagstore for a preconfigured number of bytes, then download those many bytes and then go back to the allocator for more reservation if it hasn't finished downloading the transient. In the end, it will release unused reserved bytes back to the allocator.
-
The existing manual GC mechanism works as is and no changes have been made to it.
Known Edge Case
There is an unhandled known edge case in the code.
If a group of concurrent transients downloads end up reserving all the available space in the transients directory but not enough to satisfy their individual downloads, then all of them will end up back-off retrying together for more space to become available. However, no space will become available till one of them exhausts the number of backoff-retry attempts -> fails the download -> releases reserved space. Thus, the dagstore will not make any progress with new downloads till one of the download fails and releases it's reservation.
However, this edge case should be mitigated by:
- Rate limiting the number of concurrent transients fetches
- Giving higher reservations to older downloads vs newer downloads.
PRs
- Upgrader should reserve and release allocations if transient size is unknown. #130 .
- Dagstore event loop does automated watermark based gc and handles quota allocations and reservations. #131 .
- Interface for extensible GC with a default LRU implementation. #132 .
- Config for Automated GC and tests for the entire feature. #133 .
Users will also have to configure a high and low watermark for the transients directory. The dagstore will kickstart an automated GC when it detects that the size of the transients directory has crossed the high watermark and will attempt to bring down the directory size below the low watermark threshold.
With two-watermarks systems, the goal tends to be to keep the value between the watermarks. What's described here seems to be more of a trigger/target system? ("When value is above
Known Edge Case
The edge case seems pretty dangerous. Is it possible to identify this livelock situation in the garbage collector, and interrupt transient downloads to vacate more space?
Note that there are new edge cases that emerge from such situations, e.g. a malicious user forcing the system to download a huge transient to DoS all other active downloads.
Which protocols are unable to report a shard size in your use case? Having unknown shard sizes is acceptable for trusted scenarios, but definitely a no-go for untrusted/adversarial scenarios. An attacker may exploit the system by forcing it to (1) download a shard with unknown size from themselves, and (2) send infinite garbage (cheap to do).