zed
zed copied to clipboard
"zed compact" relies on "use" being set to the compacted pool
Repro is with Zed commit 87a419e. I bumped into this while verifying #3969 and hence the same test data can be used in the repro.
The problem is that if I have a cached zed use
setting that points to a pool that's different than the one I happen to be compacting, it fails. Repro steps:
$ rm -rf "$HOME/.zed-3027-lake"
$ tar xzf zed-3027-lake.tgz
$ export ZED_LAKE="$HOME/.zed-3027-lake"
$ zed -version
Version: v1.1.0-49-g87a419e4
$ zed create OtherPool
pool created: OtherPool 2BdIgjuhlmbx4dnSd8JS1EJEM3d
$ zed use OtherPool
Switched to branch "main" on pool "OtherPool"
$ zed ls
logs 2BXT4oNSq7R5lE3fYdtpZetU2aX key ts order desc
OtherPool 2BdIgjuhlmbx4dnSd8JS1EJEM3d key ts order desc
$ zed compact $(zed query -f text 'from logs@main:objects | yield "0x${hex(id)}"')
2BXT8uiZk8IBVzeLhMGg3gvv2ih: commit object not found
Of course, if I "use" the same pool that holds the objects I'm compacting, now it works.
$ zed compact -use logs $(zed query -f text 'from logs@main:objects | yield "0x${hex(id)}"')
2BdIvsvbBGrPXSuT02E1TYwdtea compaction committed
Knowing what compaction is doing, I understand what's wrong. However, the error message was confusing and I think might be difficult for a user to connect to a cached ~/.zed_head
, for instance.
Since the zed compact
was apparently able to find the objects in the referenced logs@main
pool to start its work, it seems like it could instead do either of:
- Provide an error message pointing out the
use
mismatch, or, - Do the commit back to the pool from which the objects were pulled for compaction
FWIW, zed compact -h
says how it will "compact data objects on a pool branch" and "creates a commit on HEAD replacing the old objects with the new ones", i.e., implying that it would behave something like the second option.
The problem with implementing either 1 or 2 is we have no way of finding out the pool where a random ksuid belongs if it doesn't belong to the pool (or to be more specific, branch) you are currently operating on. You are going to run into the same issue with zed delete
or zed index apply
if you're pulling object ids with a query from another pool (even zed merge
or zed revert
if you're getting commit ids from a querying another pool). zed compact
use of -use
is consistent with every other zed command in that its going to operate on the selected branch, but perhaps I am missing something? Besides tweaks to verbiage I don't know what to do here. The use zed compact
is such a low level command I would almost be in favor of hiding it or not making it top level.
If changed the line "compact data objects on a pool branch" to "compact data objects in a pool branch" would that help?
I bounced this topic off @nwt and @mccanne and they agreed there may be something to improve here, but it's part of a likely wider set of "use"-related improvements we might want to make, perhaps at the same time we're pursuing #3956. I've opened up #3985 to act as a place to collect more "use" examples as they come up, and for now I'll put this specific issue on the back burner.
Now that compaction is on track to being delivered through the lake manager (#3923) I'm reminded of something @mattnibs said to me offline about how zed compact
is effectively a low-level command and so its ergonomics can be less of a concern. The most recent comment above also notes that the wider topic of "use" as relates to the CLI commands lives on in #3985, so I'm closing this one.