jj icon indicating copy to clipboard operation
jj copied to clipboard

FR: Generalized hook support

Open matts1 opened this issue 1 year ago • 54 comments

Is your feature request related to a problem? Please describe. I was looking at #2845 and realized that it didn't meet my needs because it didn't support pre-upload hooks. I investigated further into hooks, and the closest I found was #405, which rather than generalized hooks, specifically is trying to solve pre-commit hooks.

Describe the solution you'd like I was thinking of treating hooks like a pub-sub system. jj would publish events, which would consist of some metadata. Your hooks would simply be subscriptions to some subset of the events, matching specific metadata. For example:

message CommitDescription {
   string commit = 1;
   string description = 2;
}

message HookEvent {
    string workspace = 1;
    oneof hook_type {
        CommitDescriptionHook commit_description = 2;
        PreUploadHook pre_upload = 3;
    }
    ...
}

We would then, similar to my #3575, run all hook binaries that have subscribed to that event in the global config.toml. Similar to #3575, these hook binaries would be passed a file descriptor with a connection to the jj grpc server. For example, my global config.toml might look like:

[hooks]
subscribe_to_every_event = { binary = "/path/to/foo" }
pre_upload = { binary = "/path/to/bar", type = "pre_upload" }
specific_workspace = { binary = "/path/to/baz", workspace = "/path/to/workspace" }

We can also implement per-repo hooks in the same way that git has them by allowing additional subscribers from a hooks/jj.toml or similar, though I'd hold off on this in the short term due to security concerns.:

For example, a pre-upload hook that forces your commit to have an associated bug might look like:

conn = grpc.connect(file_descriptor(os.environ["JJ_SERVER"]))
main = importlib.import("/path/to/foo_ext.py", "main")
event = conn.get_event()

if "BUG=" not in event.commit_description.description:
    bug = input()
    description = f"{event.commit_description.description}\nBUG={bug}"
    conn.set_description(commit_id=event.commit_description.commit_id, description=description)

Though similar to my suggestion in #3575, we could provide preludes for common languages, and simply make the hook:

def main(conn, event):
    if "BUG=" not in event.commit_description.description:
        bug = input()
        description = f"{event.commit_description.description}\nBUG={bug}"
        conn.set_description(commit_id=event.commit_description.commit_id, description=description)

We would thus provide a method in jj along the lines of:

fn publish_event(event: &HookEvent) -> anyhow::Result {
    Vec<&Hook> hooks = filter_hooks(event);
    // Just a regular grpc server, but get_event() will always return the hook object.
    GrpcServer server = hook_grpc_server(event);
    for (hook in hooks) {
        let return_code = hook.run(grpc_server);
        if (return_code != 0) {
          bail!("Hook failed");
        }
    }
    Ok(())
}

Describe alternatives you've considered Didn't really consider much else, this is the first idea that came to mind. I thought it was a nice starting point. Other ideas are welcome.

Additional context Add any other context or screenshots about the feature request here.

matts1 avatar Apr 26 '24 02:04 matts1

cc @arxanas for his opinion on hooks, as we discussed them quite heavily for jj run (#405/#1869).

Im very much on board on providing client integrations and having a pub-sub like event system in the daemon but I don't want to have it in the cli. Having external languages and binaries would fix some re-occurring ideas in relation to the template language #3262 and this recent Discord discussion.

PhilipMetzger avatar Apr 26 '24 16:04 PhilipMetzger

Could you elaborate on what you mean by "don't want to have it in the CLI"?

My line of thought, as I said above, was that:

  • The hooks would be external binaries
  • The config file would specify the hooks
  • The pub/sub aspect of them would be done within jj-lib

This would mean that when you run a CLI command such as jj describe, it would run a pre-describe hook. I can't see any way of doing this cleanly other than that when you run the jj command you get hooks, so I'm rather confused by your statement of "don't want to have it in the CLI", both in terms of what you mean and why you don't want it.

What was your vision of how hooks would work?

matts1 avatar Apr 28 '24 23:04 matts1

My line of thought, as I said above, was that:

  • The hooks would be external binaries
  • The config file would specify the hooks
  • The pub/sub aspect of them would be done within jj-lib

That's all fine, if you can take the performance penalty and latency from the external binary.

Could you elaborate on what you mean by "don't want to have it in the CLI"?

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

What was your vision of how hooks would work?

I'd rather just offload the whole hook system to the daemon and it is responsible for everything, the CLI only emits events which you can react upon and nothing more.

PhilipMetzger avatar Apr 29 '24 16:04 PhilipMetzger

That's all fine, if you can take the performance penalty and latency from the external binary.

In order to be sufficiently generic, any hook is necessarily going to have to be an external binary. I'm not sure I get what you're saying. Otherwise, rather than a generic hook, I'd have a fixed set of hooks that come packaged with jj that I could use.

I'd rather just offload the whole hook system to the daemon and it is responsible for everything, the CLI only emits events which you can react upon and nothing more.

Let's use a more concrete example. Suppose I wanted to make a post-describe hook that validated that the commit had a bug associated with it in the commit description. What would the workflow look like? My imagination was:

  1. jj describe
  2. Type a commit message without a bug
  3. Start up an API server
  4. Hook runs, connecting to that API server
  5. Commit message validated via the hook, fails validation
  6. jj describe fails with an exit status

I think necessarily any time a hook runs, the CLI is going to have to wait for that hook to finish so it can check whether the hook succeeded or failed. That being said, step 3 could be skipped if we just have a permanent API server running in the daemon.

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

I think the workflow I'd imagined is impossible with what you're proposing. What user workflow do you imagine? Could you take me through what happens both from a user perspective and a code perspective with your idea.

matts1 avatar Apr 29 '24 22:04 matts1

One reason I've avoided hooks so far is that we sometimes want solutions that can validate the commits after the operation is done anyway. One example of that is when you rewrite commits on the server (e.g. via some GitHub UI).

To better understand the use cases, what does your pre-upload Gerrit hook do?

martinvonz avatar Apr 30 '24 17:04 martinvonz

One reason I've avoided hooks so far is that we sometimes want solutions that can validate the commits after the operation is done anyway

I think that both pre-* and post-* hooks are reasonable, and complementary. I don't think, however, that we can say that the existence of some way to do post-validation means that we shouldn't do pre-validation.

To better understand the use cases, what does your pre-upload Gerrit hook do?

TLDR: It first finds the binary for the pre-upload hook, then it runs that binary, passing in a list of git commit shas.

ChromeOS is a multi-repo setup. You can think of it like git submodules except we don't use submodules - we use a tool called repo instead. We have one repo called repohooks which contains all of the hooks.

When we run repo upload, it first runs ${CHROMEOS_ROOT}/src/repohooks/pre-upload.py. For example, I usually work in the repository ~/chromiumos/src/bazel, so I have a script that finds the root directory based on the existence of a .repo directory (similar to how jj does it for .jj), and runs pre-upload.py <commit1> <commit2> before doing a git push.

Pre-upload.py does various things such as:

  • Validate that our formatter doesn't need to do anything
  • Check that there's correct license headers at the top of every file.
  • Check that your commit description has a footer containing TEST=<something>
  • Check that your BUG=<something> is in the correct format

Note that it does this on every commit in the stack, not just the top commit.

Unfortunately, just running repo upload isn't an option, because it works on the current branch, and doesn't work with detached head.

matts1 avatar May 01 '24 00:05 matts1

I think that both pre-* and post-* hooks are reasonable, and complementary. I don't think, however, that we can say that the existence of some way to do post-validation means that we shouldn't do pre-validation.

If we are going to have hooks, then I think upload is a good case where they would be useful, but it might be too late to do it after uploading (you might have already accidentally uploaded your password file then).

I think the hooks you've mentioned above only require readonly access to the data. That can mostly be done after committing the transaction instead. That's not true if we also want to support hooks that can rewrite content, descriptions, branches, etc. It might be good to compile a list of use cases for that too.

martinvonz avatar May 01 '24 00:05 martinvonz

Upload hooks

Pre-upload hooks

Not too much to say. Pre-upload hooks should be able to mutate the commits in the stack (eg. running formatters). I imagine that they would get access to the jj API, and could use that to run something roughly equivalent to jj run 'immutable_heads()..commit_to_be_pushed' clang-format "$(jj files @)", or do whatever else they want.

Post-upload hooks

I can't really think of a use case for this. Maybe you could have a hook to trigger some kind of github action, for example, but it seems like that should be set up on the server side? I'd love to hear other people's opinion on whether this was useful.

Describe hooks

I think these are relatively un-contentious.

  • A pre-describe hook would pre-fill the commit message
    • A hook failure would be a warning, and would leave the message unchanged
  • A post-describe message would both validate the commit message and perform potential fixups on it
    • Failure of this would cause the editor to rerun
fn describe(commit: Commit) ->  {
  let mut msg = match pre_describe_hook(commit) {
    Ok(m) => m,
    Err(e) => {
      // If the initial description was invalid, we still have to be able to edit that description, so it doesn't really make sense to have these fail. So any pre-describe hook failure is considered an internal error to the hook.
      warning(e);
      msg
    }
  };
  while (true) {
    msg = run_editor(prefilled);
    match post_describe_hook(Commit{msg=msg, ..commit}) {
      Ok(edited) => return edited,
      // Rerun the editor
      Err(e) =>  print(e),
    }
  }
}

Commit hooks

I've saved this for last because these are by far the hardest.

  • Pre-commit hooks
    • Can edit the tree (eg. running formatters)
    • Can validate the commit (eg. running linters)
    • Can block a commit

Pre-commit hooks are hard & confusing because every time you run any jj command, it technically does a commit. It seems to me that pre-commit hooks should run when the user has the intention to say "I'm happy with this piece of work, let's commit", but that's extremely difficult to detect, because:

  • jj commit and jj amend clearly show an intention to do so
    • But amend is just an alias for squash, which is usually not used for the commit intention (probably?)
  • jj new may be used to commit, or it may be used as a "checkout" command
    • We could special-case when you new on top of the current commit, but it seems hard trying to explain all the nuances of the pre-commits to a user
  • I personally mostly use jj split when I intend to "commit" something (since it works like hg commit --interactive), but I also use it to just run a regular split.
    • Similarly to new, we could special-case this for when we're splitting @, but I suspect other people use jj split for other purposes.
  • I use jj describe to "commit" when it's the final commit in the stack (since I use the jj edit workflow)

I don't think post-commit hooks make a lot of sense. A post-commit hook can only really do validation, and you can just as easily run that validation in pre-commit.

Because of all the difficulties described above in detecting when a "commit" intention occurs, I'm honestly not a big fan of pre-commit hooks. Instead, I'd personally prefer pre-upload hooks. I can see some small justifications for why pre-commit is better than pre-upload, but I think that being able to manually run my pre-upload hooks would be the way to go, since it's just so hard to detect when you actually "commit" in jj.

Other Git hooks

  • pre-rebase
    • The documentation suggests "you can use this hook to disallow rebasing any commits that have already been pushed". However, since we have immutable commits, I don't think we really need this, and I can't think of any other use case.
  • post-checkout
    • Documentation: "This may mean moving in large binary files that you don’t want source controlled, auto-generating documentation, or something along those lines"
    • I can see some appeal to this, but IMO something like this should just be run explicitly from the user. It's also, similar to commit, quite hard to detect a "checkout". Interested to hear other opinions though
  • post-merge
    • I don't think this makes much sense in jj. Merge isn't a special operation for us, so I don't think it makes sens to special-case it
  • pre-auto-gc
    • This just seems wierd. It juts exposes internals that shouldn't be exposed?

matts1 avatar May 01 '24 01:05 matts1

I don't think post-commit hooks make a lot of sense. A post-commit hook can only really do validation

In the context of a GUI, would it make sense that post-commit could be a trigger to update the display?

joyously avatar May 01 '24 01:05 joyously

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

I think the workflow I'd imagined is impossible with what you're proposing. What user workflow do you imagine? Could you take me through what happens both from a user perspective and a code perspective with your idea.

Since transactions are cheap (atleast for the Git backend, don't know if that counts for Piper), we can just rewrite the actual objects with a new transaction instead of providing hooks, which would fail in the CLI. This would match the behavior of a hook and since all of this happens in the daemon, it is something you can easily opt out.

Describe hooks

I think these are relatively un-contentious.

  • A pre-describe hook would pre-fill the commit message

    • A hook failure would be a warning, and would leave the message unchanged
  • A post-describe message would both validate the commit message and perform potential fixups on it

    • Failure of this would cause the editor to rerun
fn describe(commit: Commit) ->  {
  let mut msg = match pre_describe_hook(commit) {
    Ok(m) => m,
    Err(e) => {
      // If the initial description was invalid, we still have to be able to edit that description, so it doesn't really make sense to have these fail. So any pre-describe hook failure is considered an internal error to the hook.
      warning(e);
      msg
    }
  };
  while (true) {
    msg = run_editor(prefilled);
    match post_describe_hook(Commit{msg=msg, ..commit}) {
      Ok(edited) => return edited,
      // Rerun the editor
      Err(e) =>  print(e),
    }
  }
}

I still don't think that such a thing belongs into the CLI. I implore you to go through to the jj run Design Doc with all it's versions (the one in the repo and the Google Doc one) and the related Discord discussion, as it contains valuable conversations in relation to hooks.

I hope I made my idea clear, that keeping hooks out of band of the CLI is long term better.

And RE: Pre/Postsubmit hooks, we'll first need some definition of forge in library for it to be useful.

PhilipMetzger avatar May 01 '24 17:05 PhilipMetzger

Note for anyone reading this thread in the future: https://github.com/martinvonz/jj/issues/1869 is the tracking bug of jj run

we can just rewrite the actual objects with a new transaction instead of providing hooks

I'm not neccesarily opposed to this.

I still don't think that such a thing belongs into the CLI

To me, the most important thing is the user experience, and implementation details are secondary. If we can have a great user experience without putting it in the CLI, I'm certainly not opposed to that, but it's really hard to evaluate whether jj run is a good alternative to hooks in the CLI (even after having read the document you mentioned), because you haven't yet proposed a specific user journey.

Suppose a repo foo has a rule that the first line of a commit message must not exceed 80 characters, and the user wants to validate that when they write a description, it adheres to that specification.

The requirements I would like to impose upon a user journey are:

  • A user must be able to use standard jj commands
    • jj describe, jj commit, jj spllit, or any other command that sets the description for a change must run this hook-like thing
      • Adding an aliasjj describe-with-bug as a jj alias is insufficient
      • Similarly for alias describe="jj run pre-fill && jj describe" as a shell alias
    • This should allow for a consistent user experience
      • It would be a pain if for one repo you needed to run jj run hooks/validate_description and for another repo you had to run jj run hooks/post-describe

Based on reading the things you asked me to read, I'm not 100% certain, but it seems that you disagree with the user journey itself, and disagree with the fundamental need for a explicit implementation of a "hook". So my questions for you are:

  • Can jj run meet those requirements (I suspect not, given the current design of jj)?
    • If not:
      • What user journey would you propose?
        • What would the user need to do to configure it (eg. .~/.config/jj/Config.toml)?
        • What would the user need to do to ensure it runs when they run a command?
      • Why do you want it if it can't meet those requirements?
        • Do you think the requirements are bad?
          • If so, why? What doesn't make sense about them?
        • Do you think that the requirements are good, but you're willing to make that sacrifice? If so, why?

From what I can tell from piecing together bits and pieces from what you're saying about doing this out of band, with a pub/sub message, and reading between the lines a lot, what would happen under the hood would look like this:

  1. jj describe -> write commit message -> submit it
  2. jj-cli publishes an event saying that a commit description has been updated
  3. (this could happen anytime from now to the very end) jj-cli returns, having succeeded
  4. jj daemon subscribes to that event and reads hooks/jj-config.toml (or some config file), and now knows that we have a subscriber to the post-describe hook that will run binary jj run hooks/post-describe
  5. jj daemon runs
  6. jj daemon runs the subscriber via something roughly equivalent to jj run hooks/post-describe (or something similar)
  7. The post-describe binary runs, and the commit message fails validation

I hope you understand why I'm confused, because in the final step there, there is no way to report the failed validation to the end-user. I'm sure your idea works, but unfortunately I don't understand what you're trying to say here, so I'd appreciate it if you could elaborate on precisely what you want. Because you've communicated vague ideas rather than a precise set of steps, it's hard for me to tell what you're trying to say.

matts1 avatar May 02 '24 00:05 matts1

I've been considering this further, and also chatted to @mlcui-corp offline.

My conclusions are:

  • Pre-upload hooks are an absolute must
    • While the user could achieve them with jj run, the fact that they can just run something like jj git push allows them to entirely skip those hooks, and users will forget.
  • pre / post-commit hooks are both unneccesary (jj run mostly makes these redundant) and difficult (it's very difficult to determine a commit intent in jj).
    • I'm no longer 100% on board with post-describe hooks. I use jj describe to write temporary names for things, similar to a git stash, and having those have to follow the conventions required for a commit that's going to be submitted seems like overkill, when we can just as easily make it a pre-upload hook.
    • I think there's a definite use case for using pre-describe hooks to pre-fill the commit description. However, the value proposition for it seems limited. Maybe setting the JJ_CHANGE_ID environment variable when running $EDITOR would allow users to configure their editor to do something similar (eg. inherit bugs from parent commits, add "Change-Id" to footer). My main concern here is that different repositories may have different requirements (eg. one that pushes to gerrit may require change-id in the footer, different remotes may have different formats for specifying bugs).

I think what we should do is:

  • Only start by implementing pre-upload hooks
    • But do so by creating a generic mechanism for hooks, so we can easily add them later elsewhere if required
  • Table pre/post-describe hooks, and see what (if any) requests for things related to this come up in the issue tracker.
  • Plan to never implement commit hooks.

I also think it's important to distinguish hooks from some sort of pub-sub system for messages. @joyously raises a good point that it's useful to have a post-commit hook so that the IDE can know when to update. However, the differences between hooks and a pub-sub system (from my perspective, at least) are:

  • Hooks can fail (and a failure generally results in the failure of the underlying command)
  • Hooks are synchronous (since you need to know whether the hook succeeded or not)
  • Hooks are configured by the user (or the repository)

I think that an IDE would be better off subscribing to a pub-sub system, so that it simply gets notified when events occur. This would mean that the user doesn't have to configure the hook for the IDE themselves, and it also means we can provide events for things that really shouldn't need to be a hook (eg. the user runs jj branch create foo - clearly, we don't need any type of hook for that, but the IDE should ideally be able to get notified).

matts1 avatar May 13 '24 00:05 matts1

@mlcui-corp also mentioned to me that pre-describe hooks could be made more difficult by jj describe -m foo, or anything where you specify the description in the command-line.

I don't think this changes my position on the matter - I think pre-upload hooks only make a good start.

matts1 avatar May 13 '24 00:05 matts1

we can just as easily make it a pre-upload hook.

Does this refer to jj git push or is there another command that falls under this label? Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

I spent a few years in the internals of WordPress, which uses a hook system in PHP (synchronous). It is amazingly flexible. One of the parameters is a priority, so that the code is run in priority order. There are two types of hook: one is a "filter" which can modify the first parameter (and must return it, so that filters can be chained), and the other is an "action" which can do whatever but no return value (more like pub-sub). Some big plugins also invoke the hook functions for their own hooks.

joyously avatar May 13 '24 00:05 joyously

Currently, pre-upload hooks refer only to jj git push in the public build. However, jj gerrit send (#2845) is currently in progress, and would presumably also use this same mechanism.

My intention is that any jj command which supports uploading would end up supporting this. This wouldn't be limited to the git backend - for example, internally at google we would be able to use this for a piper backend.

Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

IIUC, what you're trying to ask here is "can we make sure that pre-upload hooks are able to be run without actually pushing anywhere?". My intention was something similar to jj git push --dry-run except it also runs hooks (we could allow dry-run to run pre-upload hooks, but having --dry-run potentially modify commits may go against what users expect).

Alternatively, we could create a specific command to run your pre-upload hooks.

Your idea about the two types of hooks seems interesting, but I'm very hesitant to implement something like that. A pub-sub event has the rather nice property that it is guaranteed not to mutate the repository, so I'm happy to sprinkle them all over the place wherever someone might want to know about something. However, a regular hook can make no such guarantee. Allowing a user, for example, to make a hook when a branch is created means that we can no longer guarantee that a branch creation won't modify commits. I'm no expert on this, but I imagine that loosening these constraints could be a potential problem.

matts1 avatar May 13 '24 01:05 matts1

what you're trying to ask here is "can we make sure that pre-upload hooks are able to be run without actually pushing anywhere?".

Actually, I was thinking more in terms of needing hooks in other places besides pre-upload since those scenarios I mentioned wouldn't be uploading. At the same time, if the hooks are on push, they would run even when I push from one folder to another on my computer?

I agree that --dry-run should not modify anything. (Would it make a snapshot if needed?)

joyously avatar May 13 '24 03:05 joyously

Ah, I get you now, thanks for clarifying.

Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

For contributors stuck offline for a while, I imagine jj git push --dry-run --run-hooks would work. For someone working on an offline project, the same command could work, but a command like jj run-hooks pre-upload might be more appropriate (and would also solve the offline use case).

I'm thinking of it less as a pre-upload hook, and more of a pre-publish hook (where you usually publish via upload, but users could choose to do whatever they want.

matts1 avatar May 13 '24 03:05 matts1

Also, @PhilipMetzger, I'd like to hear your thoughts on my updated plans, since you seemed to have issue with hooks before.

I understand that you don't want hooks in the CLI, and after rethinking, I'm ok with that for most use cases - I'm no longer advocating for a describe hook. However, I strongly believe that we need pre-upload hooks.

For example, consider my workflow when I contribute to jj.

  1. Write code until cargo test passes
  2. On each commit, run cargo fmt && cargo clippy && cargo test
  3. jj git push

With jj run, this can be reduced to:

  1. Write code until cargo test passes
  2. Run jj run -r 'mutable()::@' 'cargo fmt && cargo clippy && cargo test (or something along these lines)
  3. jj git push

This is clearly superior, but IMO is still insufficient. If I do it this way, there are so many ways to upload an invalid commit:

  • I sometimes forget to run jj run
  • I work on several different repositories
    • I need to both learn and remember precisely which jj run command to run for each repository
  • I either need to remember the precise jj run invocation (which I often get wrong) or I need to look through my bash history for it (which may have many other random jj run invocations in it)
  • Sometimes I run something and it passes the checks. I then make a change that I think won't affect things, so I don't re-run those commands, but then it turns out it has affected things (yes, I really am my own worst enemy)

I see the purpose of a pre-upload hook as being complementary to jj run, rather than redundant.

  • A pre-upload hook is designed to protect you from uploading bad commits
  • jj run allows you to easily run things on multiple commits at once

I'd like to combine these. For example, I might create a file hooks/pre-upload containing:

#!/bin/bash -eu

jj run -r 'mutable()::@' 'cargo fmt && cargo clippy && cargo test'

Then my workflow would be:

  1. Write code until cargo test passes
  2. jj git push
  3. jj runs the hook on the specific commit that needs to be uploaded, which then invokes jj run to validate and fix the whole stack of changes
  4. It then actually uploads it once the commit is validated.

Note that I should clarify that I don't really care about the specific implementation (the bash file above was an example). I only care that from the perspective of a user of jj, I can make sure that jj doesn't let me upload a bad commit. Right now I'm trying to get buy-in on the concept of a pre-upload hook, rather than any specific implementation of it.

matts1 avatar May 13 '24 04:05 matts1

Here's a simple proposal for an example implementation. I think it's quite nice, but as I said before, I'm open to other ideas.

Step 1: Multi-layered config files

We start off by making config.toml have more layers. Currently, jj basically does jj --config_toml=$HOME/.config/jj/config.toml. I propose changing it to:

jj \
  --config_toml=<repo>/jj.toml \
  --config_toml=$HOME/.config/jj/config.toml \
  --config_toml=<repo>/.jj/config.toml

This would allow repositories to create their own configurations, but allow you to override them if you wanted to (actual filename up for debate).

Step 2: jj run aliases

Add an entry in cargo.toml config files which allows you to create jj run configurations. For example:

<repo>/jj.toml would contain:


[run-aliases]
# People might not expect pre-upload hooks modifying commits.
pre-upload = {"aliases": ["lint", "test"]}
fmt = {write: "true", "cmd": ["cargo fmt"]}
lint = {"cmd": ["cargo clippy"]}
test = {"cmd": ["cargo test"]}

It could then be merged with a user-specific config for that specific repo (eg. <repo>/.jj/config.toml)

[run-aliases]
# I like my pre-upload hooks modifying commits
pre-upload = {"aliases": ["fmt", "lint", "test"]}

Step 3: The hook

A user could then run jj run --alias fmt to run their formatter, for example.

When you run jj git push, jj would automatically run jj run -r <all revisions being uploaded> --alias pre-upload, if an alias named pre-upload was present.

matts1 avatar May 13 '24 05:05 matts1

Just a quick note that reading config from the repo has security problems. One solution to that would be to allowlist certain keys. I think Git and Mercurial don't allow reading any config from repos. I don't know if they did it that way for simplicity or if there are problems even with allowlisted keys that I'm missing.

martinvonz avatar May 13 '24 05:05 martinvonz

That's a good point.

For security reasons, git can't run hooks unless you manually copy them to the .git/hooks directory (docs). Could we consider something similar where we require a user to "install" the repo-specific config by copying it to the .jj directory, or are there additional security concerns that may not be relevant for git?

matts1 avatar May 13 '24 05:05 matts1

Just a quick note that reading config from the repo has security problems.

I was thinking of having some way of storing "suggested config" in the repos that the user would have to explicitly import. It would probably be restricted to some allowlisted keys as well.

The application I had in mind is to suggest a value for immutable_heads(), but suggested hooks make sense too.

The problem here is that the exact UI and minor decisions here have security implications and I don't have a mental model of simple conditions that are "sufficient" for security.

ilyagr avatar May 13 '24 05:05 ilyagr

Actually, there was some "config-based hooks" effort in Git a while ago (a quick googling led to this). I don't know if that series landed (@nasamuffin surely knows), but it's a good idea to at least understand what that patch series was about regardless.

martinvonz avatar May 13 '24 05:05 martinvonz

I only care that from the perspective of a user of jj, I can make sure that jj doesn't let me upload a bad commit.

I know the jj model is different, but I just wanted to toss this in. The Bazaar/Breezy command structure is a little different from Git in that you work in a branch, merge the main branch in (it changes working copy only), do whatever you want including testing and fixing conflicts, then commit the working copy. This fixes the problem Git has with merge commits and not being able to test before it's merged(committed). I'm not sure how this would fit in with jj committing everything all the time. What if the result of a failed hook was visible in the log?

joyously avatar May 13 '24 13:05 joyously

config-based hooks

No, it did not land, but not for lack of interest or viability - just for lack of developer time. Google has been carrying the rest of that series downstream for years; we haven't had any problems with it. I'd absolutely recommend that approach for jj over the historical Git approach.

nasamuffin avatar May 13 '24 16:05 nasamuffin

Sorry for the late response here, I took some time off after that particular conversation. @matts1 and I had a private conversation off-thread.

⚠️, Warning! Wall of text ahed.

Finishing the user story:

  • jj describe -> write commit message -> submit it
  • jj-cli publishes an event saying that a commit description has been updated
  • (this could happen anytime from now to the very end) jj-cli returns, having succeeded
  • jj daemon subscribes to that event and reads hooks/jj-config.toml (or some config file), and now knows that we have a subscriber to the post-describe hook that will run binary jj run hooks/post-describe
  • jj daemon runs
  • jj daemon runs the subscriber via something roughly equivalent to jj run hooks/post-describe (or something similar)
  • The post-describe binary runs, and the commit message fails validation

And then the daemon re-opens $EDITOR with something like a jj describe @- --add-footer 'BUG=' etc. The UX should feel similar how Git hooks work currently, with a way nicer rollback mechanic if needed. And since subscribers are not bound to a local machine, we can add/create infrastructure which implement those checks on remote workers (like a mini-Tricorder) or build it directly into the forge.

However, the differences between hooks and a pub-sub system (from my perspective, at least) are:

  • Hooks can fail (and a failure generally results in the failure of the underlying command)
  • Hooks are synchronous (since you need to know whether the hook succeeded or not)
  • Hooks are configured by the user (or the repository)

While I agree that this is definitely a "Hook" in the Git sense, my opinion on that is still unchanged that we can achieve the same UX with a pub/sub part in a daemon. And since jj's transactions are cheap, I see no need to fail a command when we can immediately rewrite it via daemon.

pre / post-commit hooks are both unneccesary (jj run mostly makes these redundant) and difficult (it's very difficult to determine a commit intent in jj).

  • I'm no longer 100% on board with post-describe hooks. I use jj describe to write temporary names for things, similar to a git stash, and having those have to follow the conventions required for a commit that's going to be submitted seems like overkill, when we can just as easily make it a pre-upload hook.
  • I think there's a definite use case for using pre-describe hooks to pre-fill the commit description. However, the value proposition for it seems limited. Maybe setting the JJ_CHANGE_ID environment variable when running $EDITOR would allow users to configure their editor to do something similar (eg. inherit bugs from parent commits, add "Change-Id" to footer). My main concern here is that different repositories may have different requirements (eg. one that pushes to gerrit may require change-id in the footer, different remotes may have different formats for specifying bugs).

I agree on the conclusion but I think we can achieve the same feel with even with the daemon, as pointed out above by finishing the user story.

Addressing pre-upload hooks aka (g4 presubmit/arc lint)

I think what we should do is:

  • Only start by implementing pre-upload hooks

    • But do so by creating a generic mechanism for hooks, so we can easily add them later elsewhere if required

This is fine standalone, although I'd consider the pre-upload part another FR. 👍

Does your view take into account solo projects, not pushed to a remote? Or contributors that are stuck offline for awhile, and can't push? (just curious)

IIUC, what you're trying to ask here is "can we make sure that pre-upload hooks are able to be run without actually pushing anywhere?". My intention was something similar to jj git push --dry-run except it also runs hooks (we could allow dry-run to run pre-upload hooks, but having --dry-run potentially modify commits may go against what users expect).

Alternatively, we could create a specific command to run your pre-upload hooks.

At this point we've reached the design of g4 presubmit or arc lint which was always fine to me, and advocated for via jj fix/hg fix from @martinvonz.

With jj run, this can be reduced to:

  1. Write code until cargo test passes
  2. Run jj run -r 'mutable()::@' 'cargo fmt && cargo clippy && cargo test (or something along these lines)
  3. jj git push

This is clearly superior, but IMO is still insufficient.

Thanks for recognizing jj run's value.

If I do it this way, there are so many ways to upload an invalid commit:

  • I sometimes forget to run jj run

  • I work on several different repositories

    • I need to both learn and remember precisely which jj run command to run for each repository
  • I either need to remember the precise jj run invocation (which I often get wrong) or I need to look through my bash history for it (which may have many other random jj run invocations in it)

  • Sometimes I run something and it passes the checks. I then make a change that I think won't affect things, so I don't re-run those commands, but then it turns out it has affected things (yes, I really am my own worst enemy)

I see the purpose of a pre-upload hook as being complementary to jj run, rather than redundant.

  • A pre-upload hook is designed to protect you from uploading bad commits
  • jj run allows you to easily run things on multiple commits at once

I'd like to combine these. For example, I might create a file hooks/pre-upload containing:

And as you allude to, this workflow makes sense with failing in the pre-upload/pre-push step. I don't see any issue with it. And if we have jj fix this concludes to "pre-upload for pr/12345" failed, please run 'jj fix --apply'.

I think all the related things which are proposed are fine as is with minor adjustments.

WDYT of also supporting protobuf as configuration possibility?

TL;DR

You have my buy-in for the reduced scope and moving it out of the CLI.

PhilipMetzger avatar May 13 '24 19:05 PhilipMetzger

Actually, there was some "config-based hooks" effort in Git a while ago (a quick googling led to this).

Looking at the provided technical docs, this feature really looks like the pub/sub system in this FR with the concept of "Events" and all. So if we can provide a better implementation on that front I'm happy.

And that is a really nice design, which sadly hasn't landed in Git yet 😦 .

PhilipMetzger avatar May 13 '24 19:05 PhilipMetzger

And then the daemon re-opens $EDITOR with something like a jj describe @- --add-footer 'BUG='

My point here was that by the point the validation had failed, the CLI command has already exited. If that's the case, then the daemon re-opening editor means that it can't simply re-open it in the same terminal. It would have to spin up a new terminal window containing a new instance of $EDITOR, which seems like a subpar user experience. This would work just fine if $EDITOR was a gui application, but if you use an editor such as vim, it doesn't seem great.

WDYT of also supporting protobuf as configuration possibility?

I quite like the idea I proposed where we just use the existing jj configuration file. This could mean that we could just alias jj fix to jj run --alias fix, and have the same for jj lint and jj test. What do you think about my proposal above? You never responded to it, so I don't know how you felt.

matts1 avatar May 13 '24 21:05 matts1

My point here was that by the point the validation had failed, the CLI command has already exited. If that's the case, then the daemon re-opening editor means that it can't simply re-open it in the same terminal. It would have to spin up a new terminal window containing a new instance of $EDITOR, which seems like a subpar user experience.

Ack, that makes sense. I wouldn't consider it a major blocker though, as there surely is a way to make it friendlier.

This could mean that we could just alias jj fix to jj run --alias fix

jj fix will be a standalone command and that has been decided since run was designed.

What do you think about my proposal above? You never responded to it, so I don't know how you felt.

The proposal is fine, although I'd make some different choices.

A bunch of nitpicks:

  • I don't think run should be responsible for its aliases, as it will purely duplicate existing work.
  • The aliases table should just expand to normal run invocations[^1].
  • Also there is no need to specify that the command writes, as it is the default.
  • Looking at the TOML spec, there is no good support for sets, so something better is needed.

There's also the question of virtual aliases, so the pre-upload you defined, as I'm not sure if it already is supported.

And my idea with the protobuf configuration would be for the forge anyway, as having a stable representation there is helpful.

[^1]: these would be just strings which will be passed as <command> to run, like the aliases are now.

PhilipMetzger avatar May 14 '24 21:05 PhilipMetzger

jj fix will be a standalone command and that has been decided since run was designed.

I couldn't find any issues for fix when I looked. Could you link to where this was discussed? I'm a little surprised, since jj fix seems like it's trivial to implement as an alias in your config.toml (eg. fix = ["run", "-r", "immutable_heads()..@", "--no-rebase", "cargo fmt"]' - the no-rebase is an option I would presume exists on jj run to have it not modify the tree of the children, only that commit exactly).

  • I don't think run should be responsible for its aliases, as it will purely duplicate existing work.
  • The aliases table should just expand to normal run invocations1.

My assumption was that with my virtual pre-upload step I defined, jj run would be able to come up with a dependency graph. Consider, for example:

[run-aliases]
# People might not expect pre-upload hooks modifying commits.
pre-upload = {"aliases": ["fix", "rebasing", "lint", "test"]}
fix = {write: "true", "cmd": "cargo fmt"}
lint = {"cmd": "cargo clippy"}
test = {"cmd": "cargo test"}
  • It gets told to run fix, rebasing, lint, then test
  • This means that there's an operation for fix, lint, and test, on every change in the revset.
  • It knows that the lint and test command can run at the same time, since they're read-only, but that running it on a given commit requires the fix step on that particular commit to be finished"

This seems difficult to do with a regular alias. But maybe it's not on the table for run at all, in which case I guess a regular alias would work just fine.

  • Also there is no need to specify that the command writes, as it is the default.

You've probably already considered this, but having write as the default may leave performance on the table, since if I run `jj run -r 'immutable_heads()..@' 'cargo lint', then since I know it's readonly, I should be able to run it in parallel on every revision at once.

  • Looking at the TOML spec, there is no good support for sets, so something better is needed.

Could you clarify what specifically you needed sets for. IMO, sets can be mostly replaced with lists, and if we really wanted a set type jj can validate that there were no duplicates.

There's also the question of virtual aliases, so the pre-upload you defined, as I'm not sure if it already is supported.

To the best of my knowledge, it's currently unsupported to have multiple jj commands in a single alias, but I do think it'd be helpful. I think something like the following would work relatively well:

[aliases]
lint = ["run", "cargo clippy"]
test = ["run", "cargo test"]
pre-upload = {
    "opts": {
        name: "revision",
        short: "r",
        default: "immutable_heads()..@"
    },
    "commands": [
        # Could potentially add metadata here to create a dependency graph and allow commands to run in parallel.
        ["lint", "--revision=$revision"],
        ["test", "--revision=$revision"],
    ]
}

And my idea with the protobuf configuration would be for the forge anyway, as having a stable representation there is helpful.

This seems like it's introducing yet another configuration file for the forge. Plenty of these exist already. It seems easier to just have the repo define the configuration in a version-controlled file (#3684), and have the forge just invoke jj aliases (eg. jj test -r @ to run the alias test = ["run", "cargo test"]). This would allow the forge to just use the pre-existing configuration files, and also allow the forge to essentially "test" the jj aliases.

matts1 avatar May 14 '24 23:05 matts1