web3-dev-team icon indicating copy to clipboard operation
web3-dev-team copied to clipboard

IPFS Garbage Collection v2

Open gammazero opened this issue 4 years ago • 14 comments

gammazero avatar Feb 16 '21 19:02 gammazero

Associating a solution proposal: https://hackmd.io/bgU7ti6_THK5gAGcrAhoEw?both

BigLep avatar Feb 23 '21 21:02 BigLep

I wrote up some thoughts on other approaches to GC (n.b., first draft warning): https://hackmd.io/0T8z0ex8RWmI8BmFNPSafQ

The draft idea in that document is a GC approach, but not a reference counting one.

I'm fairly dubious of reference counting approaches for a disk-backed GC. Reference counting requires a strong degree of consistency, and a great deal of mutexing, which makes it both relatively fragile and high cost at runtime. Metrics evaluated on a full implementation would of course shed more light than my bets, but I'm just not optimistic on that whole angle of approach.

warpfork avatar Feb 24 '21 21:02 warpfork

I like that this issue also touches on the user story of wanting to actively remove specific content. However, I wonder if that's directly connected to GC. I think we could provide direct remedies for that user story without involving GC at all.

(Although on second thought, active targeted removal would actually be connecting and somewhat more difficult in a reference counting world -- because active removal would have to trigger walks to make sure orphaned subgraphs aren't left behind. Orphaned subgraphs with incorrectly undecremented refcounts which would become effectively unremediable in a refcount world.)

warpfork avatar Feb 24 '21 21:02 warpfork

@warpfork I read your thought on an approach to GC, linked it in the pitch doc, and suggested we at least look at a prototype for comparison.

I generally agree with your feelings about reference count resource management, but an advantage with the reference counting is that it should work especially well for Pinata's use case: The only need GC for removing content that has become unpinned. The reference-counted implementation allows GC to walk the DAG for only the unpinned content and remove the associated blocks without loading all the other pinned content.

gammazero avatar Feb 25 '21 02:02 gammazero

This is very interesting, as it's touching some points I noted a while back here.

But additionally to "just" count references my note includes tracking of accesses and access time in a compressed combined metric for blocks stored which are not pinned.

Throwing all unpinned blocks away makes sense for a provider of storage, but not on end-user clients, gateways etc. which can anticipate to a degree on the past which blocks might be useful for the future.

I haven't had time yet to work on it, sadly, because of pandemic reasons - but that's still the plan.

RubenKelevra avatar Mar 02 '21 03:03 RubenKelevra

@RubenKelevra I agree it would be very useful to be able to be more discriminating about what unpinned content gets removed. That would provide cache management behavior. There are a number of complexities with efficiently maintaining and accessing LRU/LFU data, and the referencing counting approach is intended to be a hopefully simpler solution to address a specific need. That is, a way for GC to determine if a block is pinned or in MFS without having to load and search a set of all pinned blocks.

The immediate goal is to provide faster bulk GC, and more importantly, provide very fast removal of specific content (remove blocks from unpinned DAG). The key feature of ref-counting is that it gives O(blocks-in-dag) performance without having to load other content into memory. The ref-counting also does not incur any overhead for most data access, since it is only used during pinning/unpinning, MFS add/remove, and GC.

More sophisticated data eviction/retention strategies are not precluded by referencing-counting and should be explored.

gammazero avatar Mar 02 '21 07:03 gammazero

@RubenKelevra I agree it would be very useful to be able to be more discriminating about what unpinned content gets removed. That would provide cache management behavior. There are a number of complexities with efficiently maintaining and accessing LRU/LFU data, and the referencing counting approach is intended to be a hopefully simpler solution to address a specific need. That is, a way for GC to determine if a block is pinned or in MFS without having to load and search a set of all pinned blocks.

I'm just referencing my note, since I planned to do both in one approach - since it's basically the same problem with:

  • the needs to store an intention log of pinning/unpinning operations
  • storing a database in memory with a value for each block
  • writing the database to disk in larger chunks
  • background analysis of pinning/unpinning operations
  • time based locking to avoid to slow down any operation by blocking it

The advantage of combining both cache retention strategy and reference counting is

  • lower total complexity
  • dead locks/data loss gets less likely
  • more code can be shared without refactoring
  • much lower memory consumption

The immediate goal is to provide faster bulk GC, and more importantly, provide very fast removal of specific content (remove blocks from unpinned DAG). The key feature of ref-counting is that it gives O(blocks-in-dag) performance without having to load other content into memory. The ref-counting also does not incur any overhead for most data access, since it is only used during pinning/unpinning, MFS add/remove, and GC.

I don't think bulk-GC is the way to go, but instead just have a tight limitation of the amount of stuff stored in the unpinned area, like "use only 1% of the free storage for unpinned stuff"

This allows for easier locking since we can do some assumptions which helps us to avoid having to lock the whole datastore - roughly:

  • If a block gets added we put it on a list with a timestamp.
  • If a pinning operation starts, we put it on a list of active operations with a timestamp.
  • If a block in this list has an timestamp which is older than the oldest running pinning operation -> move to cache area since it can be safely discarded.

If a pinning operation calls for a block in the cache area, just add a reference and put it in the "pinned" list - but there was always zero guarantee that a block stays in the datastore when it's not pinned. Those assumptions are safe and avoid any locks on the datastore.

So the cleanup would run continuously as background task flagging the oldest and less accessed blocks for removal, not as bulk-cleanup.

RubenKelevra avatar Mar 02 '21 08:03 RubenKelevra

On behalf of the ipfs.io gateways I implore this endevour to consider an LRU eviction strategy as the default for the IPFS repo. Framing this work as a "cache retention strategy" as @RubenKelevra points to, rather than a "garbage collection improvement". Unpinned blocks in IPFS are not unreachable objects in the traditional GC sense of things, and are much closer to "what files should my CDN keep hot". We have limited bandwidth so should avoid scope creep, but that is also why it's even more important to get the framing of this project right. If we make GC less painful it will be a while before we get time to focus on making it what we need.

A relevant paper came up from the first ResNetLab event that is worth a read Caching the Internet: A View from a Global Multi-tenant CDN

In this paper, we explore the efficacy of popular caching policies in a large-scale, global, multi-tenant CDN. We examine the client behaviors observed in a network of over 125 high-capacity Points of Presence (PoPs). Using production data from the Edgecast CDN, we show that for such a large-scale and diverse use case, simpler caching policies dominate. We find that LRU offers the best compromise between hit-rate and disk I/O, providing 60% fewer writes than FIFO, while maintaining high hit-rates. We further observe that at disk sizes used in a large-scale CDN, LRU performs on par with complex polices like S4LRU. – https://link.springer.com/chapter/10.1007%2F978-3-030-15986-3_5

olizilla avatar Mar 02 '21 16:03 olizilla

@olizilla thanks for the paper, will check it out.

My idea is to balance access numbers with time the object stayed in the cache. In short a number will be increased on each access and decreased after a fixed duration.

This way we avoid cache trashing if many objects get accessed for a short while, but afterwards the access pattern changes back.

It's also important to have a rough idea how large a block is, so we know without accessing the datastore how far we've come to meet our target on clearing space.

Additionally it makes sense to measure the amount of bytes which gets added to the datastore and send this information to a process which can delete old cache objects from the datastore in the same rate. This allows us to maintain a constant fill level on the datastore with minimal memory usage.

The datastructure in my note would squeeze this all in a 32 bit integer as additional metadata for each checksum.

While there's probably quite some overhead to put some trees structures for finding the lowest number in the cache section and a specific hash in the reference counter section the total memory consumption should be pretty minimal.

RubenKelevra avatar Mar 02 '21 17:03 RubenKelevra

Well, I don't know I don't see why LRU should be better.

The most similar algorithms to my note would be ARC (which is used in ZFS, but is patented) and CAR.

Here's a paper comparing them to LRU:

https://www.usenix.org/conference/fast-04/car-clock-adaptive-replacement

The File: /ipfs/Qmaksm3FYJupAzkZK6EPpaxjyc6zocJvzufyo6AFSMkRBm

If the cache is too small for the workload, LRU is basically useless (cache trashing occurs), while ARC/CAR outperforms it with 3-4 times more hits.

RubenKelevra avatar Mar 02 '21 18:03 RubenKelevra

@olizilla @RubenKelevra I want to be clear that, yes, we do want a cache management approach to garbage collection eventually. It should work in harmony with reference counting. Reference counting is solving the specific problem of needing to know if blocks can be removed, without having to load and search other pinned/mfs content. It addresses a situation that is severely impacting pinning service providers.

Gateways, and many general users, have a definite need for cache management. In summary, that means removing blocks that have a lower calculated probability of reuse than others that are competing for storage space (regardless of algorithms used LRU, LFU, ARC, 2-queue, etc.). I am treating that (which blocks to delete first/incrementally) as a separate independent problem from the problem that ref-counting addresses (efficiently determine if a block can be removed).

A cache management strategy should be a separate proposal/project. If it does happen to share some part of implementation with referencing counting, that is an implementation detail and does not mean these are the same problem. There are existing designs for a cache management approach to GC, and these may be useful/informative for creating that solution.

gammazero avatar Mar 02 '21 19:03 gammazero

@gammazero I agree that they are somewhat separate from the purpose.

But there's some advantage of having the GC holding two lists: Pinned (or saved in the MFS) and unpinned (not saved in the MFS) blocks. This allows for much quicker removal from the datastore since it can just randomly drop blocks from the "unpinned" list until the list is empty or the storage requirements are met again. If you add a size approximation of each block to the CID you could probably even drop blocks without having to request any additional information from the database and just issue removal requests for the CIDs.

While the CIDs in the refcount data structure needs to be accessible via the CID, so a b-tree is probably required, the CIDs in the unpinned list just need an efficient data structure to save memory. This way you would just drop the oldest entries added, like FIFO.

RubenKelevra avatar Mar 09 '21 20:03 RubenKelevra

would like to see a review approval from @Stebalien on this one since he’s talked a bit about this already.

mikeal avatar Mar 23 '21 21:03 mikeal

I don't see this obviously linked here - so bumping the updated design doc on GC implementation based on @gammazero's great research and prototyping: https://hackmd.io/bgU7ti6_THK5gAGcrAhoEw

momack2 avatar Apr 21 '21 02:04 momack2