zed icon indicating copy to clipboard operation
zed copied to clipboard

"zed compact" relies on "use" being set to the compacted pool

Open philrz opened this issue 2 years ago • 2 comments

Repro is with Zed commit 87a419e. I bumped into this while verifying #3969 and hence the same test data can be used in the repro.

The problem is that if I have a cached zed use setting that points to a pool that's different than the one I happen to be compacting, it fails. Repro steps:

$ rm -rf "$HOME/.zed-3027-lake"

$ tar xzf zed-3027-lake.tgz

$ export ZED_LAKE="$HOME/.zed-3027-lake"

$ zed -version
Version: v1.1.0-49-g87a419e4

$ zed create OtherPool
pool created: OtherPool 2BdIgjuhlmbx4dnSd8JS1EJEM3d

$ zed use OtherPool
Switched to branch "main" on pool "OtherPool"

$ zed ls
logs 2BXT4oNSq7R5lE3fYdtpZetU2aX key ts order desc
OtherPool 2BdIgjuhlmbx4dnSd8JS1EJEM3d key ts order desc

$ zed compact $(zed query -f text 'from logs@main:objects | yield "0x${hex(id)}"')
2BXT8uiZk8IBVzeLhMGg3gvv2ih: commit object not found

Of course, if I "use" the same pool that holds the objects I'm compacting, now it works.

$ zed compact -use logs $(zed query -f text 'from logs@main:objects | yield "0x${hex(id)}"')
2BdIvsvbBGrPXSuT02E1TYwdtea compaction committed

Knowing what compaction is doing, I understand what's wrong. However, the error message was confusing and I think might be difficult for a user to connect to a cached ~/.zed_head, for instance.

Since the zed compact was apparently able to find the objects in the referenced logs@main pool to start its work, it seems like it could instead do either of:

  1. Provide an error message pointing out the use mismatch, or,
  2. Do the commit back to the pool from which the objects were pulled for compaction

FWIW, zed compact -h says how it will "compact data objects on a pool branch" and "creates a commit on HEAD replacing the old objects with the new ones", i.e., implying that it would behave something like the second option.

philrz avatar Jul 07 '22 21:07 philrz

The problem with implementing either 1 or 2 is we have no way of finding out the pool where a random ksuid belongs if it doesn't belong to the pool (or to be more specific, branch) you are currently operating on. You are going to run into the same issue with zed delete or zed index apply if you're pulling object ids with a query from another pool (even zed merge or zed revert if you're getting commit ids from a querying another pool). zed compact use of -use is consistent with every other zed command in that its going to operate on the selected branch, but perhaps I am missing something? Besides tweaks to verbiage I don't know what to do here. The use zed compact is such a low level command I would almost be in favor of hiding it or not making it top level.

If changed the line "compact data objects on a pool branch" to "compact data objects in a pool branch" would that help?

mattnibs avatar Jul 08 '22 16:07 mattnibs

I bounced this topic off @nwt and @mccanne and they agreed there may be something to improve here, but it's part of a likely wider set of "use"-related improvements we might want to make, perhaps at the same time we're pursuing #3956. I've opened up #3985 to act as a place to collect more "use" examples as they come up, and for now I'll put this specific issue on the back burner.

philrz avatar Jul 11 '22 20:07 philrz

Now that compaction is on track to being delivered through the lake manager (#3923) I'm reminded of something @mattnibs said to me offline about how zed compact is effectively a low-level command and so its ergonomics can be less of a concern. The most recent comment above also notes that the wider topic of "use" as relates to the CLI commands lives on in #3985, so I'm closing this one.

philrz avatar Jun 30 '23 17:06 philrz