OCIStore is not concurrency safe at a process-level
When running multiple helm chart commands at the same time, there can be a race to write the updated OCIStore index reference resulting in missing references for later commands. Is it expected that this should work or should each helm chart command be given a new registry cache directory to avoid this?
Failing test case:
package content
import (
"io/ioutil"
"testing"
ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)
func TestConcurrentSaveIndex(t *testing.T) {
tempdir, err := ioutil.TempDir("", "")
if err != nil {
t.Fatal(err)
}
store, err := NewOCIStore(tempdir)
if err != nil {
t.Fatal(err)
}
err = store.LoadIndex()
if err != nil {
t.Fatal(err)
}
parallelStore, err := NewOCIStore(tempdir)
if err != nil {
t.Fatal(err)
}
err = parallelStore.LoadIndex()
if err != nil {
t.Fatal(err)
}
store.AddReference("my-first-reference", ocispec.Descriptor{})
parallelStore.AddReference("my-second-reference", ocispec.Descriptor{})
err = store.SaveIndex()
if err != nil {
t.Fatal(err)
}
err = parallelStore.SaveIndex()
if err != nil {
t.Fatal(err)
}
err = store.LoadIndex()
if err != nil {
t.Fatal(err)
}
parallelStore.LoadIndex()
if err != nil {
t.Fatal(err)
}
if _, ok := store.ListReferences()["my-first-reference"]; !ok {
t.Error("Store does have have my-first-reference")
}
if _, ok := store.ListReferences()["my-second-reference"]; !ok {
t.Error("Store does have have my-second-reference")
}
if _, ok := parallelStore.ListReferences()["my-first-reference"]; !ok {
t.Error("Store does have have my-first-reference")
}
if _, ok := parallelStore.ListReferences()["my-second-reference"]; !ok {
t.Error("Store does have have my-second-reference")
}
}
@steved thanks for opening. Yes, it is true. The source of truth for all helm refs are stored in a single file. Do you have any thoughts on how to address this?
The best idea I have is some sort of file locking scheme that allows merging of image references at save time. Is this library the correct place for a change, though? It implements the upstream OCI image spec and it doesn't appear that the specification is particular concurrency-safe either. Is this something that Helm should potentially implement as a consumer of the OCIStore?
I have considered adding code to umoci to do some file locking on index.json to try to fix this problem, so if we go in that direction maybe we should make sure both umoci and oras handle this the same way -- in fact it might not hurt to migrate oras to using umoci's CAS libraries so that we don't have to develop them in parallel.
Unfortunately if any other tool touches index.json without acquiring a filesystem lock (which might not even be enough if you're using things like NFS), then you hit this problem again. And there isn't really a nice POSIX-only way of doing a cmpxchg.
From the umoci side of things, this also might be an API issue -- should the API for updating an index.json actually look like cmpxchg (right now it's just GetIndex and PutIndex).
(Slightly annoyingly, the pre-index.json system was based on a directory of references and we switched away from it pre-1.0 of the image-spec. There were valid reasons for the switch, it's just a little annoying that this problem wasn't accounted for at the time.)
@cyphar - did umoci ever end up implementing any lock mechanism on index.json?