zed
zed copied to clipboard
pkg/storage: lake packages assume Put and PutIfNotExist atomicity not provided by FileSystem
This issue has been with us for as long as pkg/stroage, but #4274 brings it into focus.
The lake, lake/commits, lake/data, and lake/journal packages all contain uses of pkg/storage.Engine.Put and PutIfNotExist that assume the returned io.WriteCloser will create an object at the given path atomically when its Close method is called. If it does not, those packages can observe empty or partial objects, resulting in transient or permanent errors for lake operations.
While pkg/storage.S3Engine provides this atomicity, pkg/storage.FileSystem does not. We can fix that by changing the FileSystem implementations of Put and PutIfNotExist to create a temporary file and return an io.WriteCloser whose Close method will move the temporary file into place, but that probably isn't the behavior we want for zio/emitter, which implements the -o and -split flags and where I think we want to keep the current behavior of truncating existing files and allowing other processes to observe writes as they happen.
One way to resolve this tension is to make the change described in the previous paragraph while also introducing a new Engine method (call it PutNonAtomic for now, though that's an ugly name) whose semantics match the current (non-atomic) Put implementation provided by FileSystem.
PutIfNoExist should not need this since it uses the O_EXCL flag meaning it will produce an error if the file already exists.
PutIfNotExist currently does not prevent readers from seeing empty or partial objects. But on further reflection, I'm not sure there's any way to get both that and O_EXCL behavior.
Good point.
A community zync user running Zed v14.0.0
recently bumped into a problem for which we believe this issue was the root cause. They reported seeing this stack trace from zed serve
when running zed ls
.
{"level":"error","ts":1715345700.1837773,"logger":"journal","caller":"journal/store.go:100","msg":"Loading snapshot","error":"malformed zng record","stacktrace":"github.com/brimdata/zed/lake/journal.(*Store).load\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/journal/store.go:100\ngithub.com/brimdata/zed/lake/journal.(*Store).All\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/journal/store.go:242\ngithub.com/brimdata/zed/lake/pools.(*Store).All\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/pools/store.go:40\ngithub.com/brimdata/zed/lake.(*Root).ListPools\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/root.go:235\ngithub.com/brimdata/zed/lake.(*Root).BatchifyPools\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/root.go:185\ngithub.com/brimdata/zed/runtime/sam/op/meta.NewLakeMetaScanner\n\t/go/pkg/mod/github.com/brimdata/[email protected]/runtime/sam/op/meta/scanner.go:23\ngithub.com/brimdata/zed/compiler/kernel.(*Builder).compileLeaf\n\t/go/pkg/mod/github.com/brimdata/[email protected]/compiler/kernel/op.go:236\ngithub.com/brimdata/zed/compiler/kernel.(*Builder).compile\n\t/go/pkg/mod/github.com/brimdata/[email protected]/compiler/kernel/op.go:609\ngithub.com/brimdata/zed/compiler/kernel.(*Builder).compileSeq\n\t/go/pkg/mod/github.com/brimdata/[email protected]/compiler/kernel/op.go:405\ngithub.com/brimdata/zed/compiler/kernel.(*Builder).Build\n\t/go/pkg/mod/github.com/brimdata/[email protected]/compiler/kernel/op.go:82\ngithub.com/brimdata/zed/compiler.(*Job).Build\n\t/go/pkg/mod/github.com/brimdata/[email protected]/compiler/job.go:116\ngithub.com/brimdata/zed/compiler.(*lakeCompiler).NewLakeQuery\n\t/go/pkg/mod/github.com/brimdata/[email protected]/compiler/lake.go:53\ngithub.com/brimdata/zed/runtime.CompileLakeQuery\n\t/go/pkg/mod/github.com/brimdata/[email protected]/runtime/compiler.go:58\ngithub.com/brimdata/zed/lake/api.(*local).QueryWithControl\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/api/local.go:120\ngithub.com/brimdata/zed/lake/api.(*local).Query\n\t/go/pkg/mod/github.com/brimdata/[email protected]/lake/api/local.go:108\ngithub.com/brimdata/zed/cmd/zed/ls.(*Command).Run\n\t/go/pkg/mod/github.com/brimdata/[email protected]/cmd/zed/ls/command.go:83\ngithub.com/brimdata/zed/pkg/charm.path.run\n\t/go/pkg/mod/github.com/brimdata/[email protected]/pkg/charm/path.go:11\ngithub.com/brimdata/zed/pkg/charm.(*Spec).ExecRoot\n\t/go/pkg/mod/github.com/brimdata/[email protected]/pkg/charm/charm.go:63\nmain.main\n\t/go/pkg/mod/github.com/brimdata/[email protected]/cmd/zed/main.go:64\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
They were able to share their lake data and we narrowed down the repro to seeing themalformed zng record
error when attempting to read the pools/snap.zng
with zq
. @nwt examined the snap.zng
file and observed that it appeared to be truncated. This known open issue is one way we're aware of that such truncation might occur.
Fortunately the interim workaround in this case was simple. The pools/snap.zng
file contains snapshot data that will be automatically be recreated when necessary, so the user was able to manually delete it and continue using the lake without hitting the stack trace.