oras-go icon indicating copy to clipboard operation
oras-go copied to clipboard

Feature Request: Consider supporting size management for OCI Store

Open akashsinghal opened this issue 1 year ago • 15 comments

Currently, there isn't built in functionality for deleting and managing OCI Store size. It would be nice if there was a garbage collection-like functionality to evict LRU blobs from the index and delete the corresponding blobs from the OCI store

akashsinghal avatar Mar 24 '23 17:03 akashsinghal

cc: @Wwwsylvia

akashsinghal avatar Mar 24 '23 17:03 akashsinghal

Duplicate of #454

shizhMSFT avatar Mar 27 '23 02:03 shizhMSFT

Is there a plan to implement higher level function beyond Delete? Such as a way to add a time last accessed annotation in the index.json and an accompanying lastused descriptor method? Basically an API to make it easier to implement an LRU eviction strategy.

akashsinghal avatar Mar 27 '23 19:03 akashsinghal

add a time last accessed annotation in the index.json and an accompanying last used descriptor method?

That seems an auto purge functionality and is currently out of the scope since index.json might be immutable or supposed to be read-only.

shizhMSFT avatar Mar 30 '23 15:03 shizhMSFT

While considering similar concerns in our code base, we considered using containerd's content store package, which offers garbage collection controlled by labels.

Having support for some similar functionality in oras would be great, but at the moment I'm more interested in getting #470 so we can delete at all. Once it supports delete and updating the predecessor/successor graph, then we can at least identify unreferenced/orphaned resources.

sparr avatar Jun 20 '23 15:06 sparr

Removing the duplicate label as this issue request Garbage Collection in addition to #454

shizhMSFT avatar Oct 26 '23 08:10 shizhMSFT

cc: @wangxiaoxuan273 @eiffel-fl

shizhMSFT avatar Oct 26 '23 09:10 shizhMSFT

Let's have some discussions on the following general questions, which choices work better with actual needs?

  1. Do we want to implement auto-garbage collection in Delete(), or provide a public GC function and let user call it manually?
  2. Do we need to clean garbage recursively? For example given the graph A -> B -> C -> D, when A is deleted, should we delete B, C, D or only B?

You are welcome to take a look at the draft PRs #652, #653 and give feedbacks on the design. These two PRs collectively implement automatic recursive garbage collection upon oci.Store.Delete().

wangxiaoxuan273 avatar Nov 29 '23 04:11 wangxiaoxuan273

Hi!

Let's have some discussions on the following general questions, which choices work better with actual needs?

1. Do we want to implement auto-garbage collection in `Delete()`, or provide a public GC function and let user call it manually?

Automatic would be welcomed, but having a dedicated function as a first step is OK.

2. Do we need to clean garbage recursively? For example given the graph `A -> B -> C -> D`, when `A` is deleted, should we delete `B, C, D` or only `B`?

The garbage collection should be recursive, otherwise we will end up with wasted node (C and D in the first case).

You are welcome to take a look at the draft PRs #652, #653 and give feedbacks on the design. These two PRs collectively implement automatic recursive garbage collection upon oci.Store.Delete().

I took a look at your PRs and they look pretty fine. I will test #653 in Inspektor Gadget.

Best regards.

eiffel-fl avatar Nov 29 '23 08:11 eiffel-fl

  1. Do we want to implement auto-garbage collection in Delete(), or provide a public GC function and let user call it manually?

I think we can start with providing a public GC function. And then we can consider having a store option for enabling auto-GC. It could be somewhat similar to SaveIndex() and AutoSaveIndex.

  1. Do we need to clean garbage recursively? For example given the graph A -> B -> C -> D, when A is deleted, should we delete B, C, D or only B?

Yes, the idea of GC is to clean up all garbage (dangling nodes). If we delete only the direct successors (B), there will still be garbage remained (C and D), and that's just another implementation of delete, not GC.

Wwwsylvia avatar Nov 30 '23 03:11 Wwwsylvia

  1. Do we want to implement auto-garbage collection in Delete(), or provide a public GC function and let user call it manually?

I think we can start with providing a public GC function. And then we can consider having a store option for enabling auto-GC. It could be somewhat similar to SaveIndex() and AutoSaveIndex.

Then should the default value of GCEnabled be true or false? Which choice is better? @Wwwsylvia

wangxiaoxuan273 avatar Nov 30 '23 04:11 wangxiaoxuan273

#656 will enable oci.Store to remove any untraceable / dangling blobs, including manifest blobs and layer blobs.

However, this issue requests a higher level of garbage collection. That is predicate-based garbage collection where the predicate is based on access metadata (e.g. LRU mentioned in the issue description). In summary, it requires the following sub features to fulfill the request:

  1. A metadata database to track the access metadata
  2. A GC scheduler to trigger the GC
  3. GC feature itself, which is being implemented by #656

For 1, the metadata database can be in memory if it is within a single process, using a file if it is accessed by a single process at a time, or using a real DB if multi-process access is required. Here is an example implementation:

type MetadataStore struct {
    *oci.Store
    LastAccessTime sync.Map // map[digest.Digest]time.Time
}

func (s *MetadataStore) Push(ctx context.Context, expected ocispec.Descriptor, reader io.Reader) error {
    if err := s.Store.Push(ctx, expected, reader); err != nil {
        return err
    }
    if isManifest(expected) {
        s.LastAccessTime.Store(expected.Digest, time.Now())
    }
    return nil
}

func (s *MetadataStore) Fetch(ctx context.Context, target ocispec.Descriptor) (io.ReadCloser, error)
    if rc, err := s.Store.Fetch(ctx, target); err != nil {
        return nil, err
    }
    if isManifest(target) {
        s.LastAccessTime.Store(target.Digest, time.Now())
    }
    return rc, nil
}

func (s *MetadataStore) Delete(ctx context.Context, target ocispec.Descriptor) error {
    if err := s.Store.Delete(ctx, target); err != nil {
        return err
    }
    s.LastAccessTime.Delete(target.Digest)
    return nil
}

For 2, a GC scheduler can be implemented using a go-routine with a predicate. For example,

ms := GetMetadataStore()
ctx := context.Todo()
ticker := time.NewTicker(time.Minute * 10) // 10 mins
for range ticker.C {
    func() {
        ctx, cancel := context.WithTimeout(ctx, time.Second * 10) // stop the world for 10s for GC
        defer cancel()
        // remove those images which are not accessed in 1 day
        ms.LastAccessTime.Range(func(key, value any) {
            dgst := key.(digest.Digest)
            accessTime := value.(time.Time)
            if accessTime.Add(time.Hour * 24).Before(time.Now()) {
                ms.Delete(ctx, ocispec.Descriptor{Digest: dgst})
            }
        })
        ms.GC(ctx) // optional as ms.AutoGC is enabled by default
    }()
}

@akashsinghal We need your inputs / comments on this feature request as we are not sure if this feature request is still sync with your demand so that we can continue working on it or close when #656 is merged.

/cc @wangxiaoxuan273 @FeynmanZhou @yizha1

shizhMSFT avatar Dec 29 '23 06:12 shizhMSFT

@shizhMSFT Thanks for laying out the functionality required here. We are on the same page. The LRU eviction with scheduled GC is exactly what the Ratify project is looking for. In our specific use case, it's multi process so the metadata store implementation would need to handle concurrent read/writes. We already have requirements of oci.Store for multi process support when reading/writing to the index file #286.

Adding GC support is a great start for us in #656. However we will need the metadata store and scheduling support for us to be able to effectively use GC. If oras-go can support this that would be great, however I'm not sure what level of functionality ORAS folks would like to expose versus require consumers (Ratify) to build and manage on top of ORAS on their own. Do you have any thoughts on this boundary?

If the metadata store was built in to oras-go and exposed, would it still be the consumer's responsibility to provide a metadata store implementation that fits their needs (single process vs multi concurrent processes DB)?

akashsinghal avatar Jan 03 '24 19:01 akashsinghal

Since it is non-trivial to define a generic metadata store, it would be better if Ratify can implement its metadata store for its specific scenarios.

Multi-process support is also another non-trivial work and it is not on the roadmap of oras-go as it requires an atomicity source such as databases, file systems with O_EXCL support, and etc.

shizhMSFT avatar Jan 04 '24 03:01 shizhMSFT

As #653 and #656 are merged, the basic delete features have been done. Thank @wangxiaoxuan273 @eiffel-fl @carabasdaniel for your contributions to this effort. Since this issue request more features, which are out of the scope of v2.4.0, I will keep this issue open and move it to future milestone.

shizhMSFT avatar Jan 11 '24 06:01 shizhMSFT