FR: Using `jj` as library in third party projects

Open poliorcetics opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe.

I would like to implement the DiffProvider trait in helix for Jujutsu, like it is for Git

Describe the solution you'd like

Overall I would like to be able to use jj-lib to

Easily open a repo (readable OR mutable, that can be two differents types or a config, it's not important) when given a path anywhere inside that repo
Easily access convenience methods on repos like
- files at a particular revision (allows computing the diff on our side in helix using imara-diff
- commit message at a given revision
- current commit id
- current commit branch (if any)
- navigate the tree with things like repo.view_at_commit(...).parents() or .children() or equivalents in some way

I'm sure there are dozens of other "common" things that will be asked over time so it could be done in a new library on top of jj-lib that is intended for use by third parties without interfering with the core algorithms and data structures in jj-lib. Lots of helpers only available in jj-cli could be moved in that hypothetical lib for example.

Describe alternatives you've considered

I've tried to do it and just opening a workspace (not even a repo) is very hard:

        // No method for auto discovery of the .jj, must be done by each consumer
        let ws_loader = WorkspaceLoader::init(workspace_dir).unwrap();

        // Defaults and empty things everywhere because how do I fill those things ? They're only instantiated in jj-cli, using private elements
        let ws = ws_loader.load(
            &UserSettings::from_config(Default::default()),
            &StoreFactories::empty(),
            &Default::default(),
        ).unwrap();

        // it just gets worse

Documentation doesn't exist much but honestly that wasn't such a bother, rustdoc makes it easy to navigate code and the types are named mostly logically so it would make things easier but I'm not gonna blame Jujutsu for not having very expansive docs in a 0.x version :)
I tried to look into using jj-cli to use all the helpers and things available there but it wasn't much better: the entry point of CliRunner::init() only accepts moving forward through .run(), returning an ExitCode, which makes it hard to do anything with it. I could use a channel for example, but for a diff in a single code file (so often very very small), that's overkill both in code and ergonomics.
I could invoke jj as a subprogram, overriding the config to define some custom diff handler and go from there, but again, that's way more layers of abstractions and possible failures that needed. It would also mean helix needs to invoke jj as binary, trusting it's not malicious and actually jj and not some third party command that happens to have the same name but doesn't at all does the correct thing.
It would also be sad because both jj and `helix are developed in Rust, so it's not like I'm trying to interface two incompatibles languages here 😅

Additional context

If you have any questions or channels on which rapid discussions is possible I can join them to help with that, if you have guidance to give I can try and help implement/document all that 😄

Feb 12 '24 23:02 poliorcetics

Thanks for asking! This is an important problem that we'll need to figure out sooner or later. I'm afraid I don't have to give a very detailed answer right now, but here are some points I'd like to make:

You are right that the jj-lib is missing some higher-level functionality. We would like to move parts of jj-cli into jj-lib. In particular, I think we should have some UI-independent support for snapshotting the working copy and updating the repo with that, and for running a transaction and then updating the working copy afterwards.
The support for pluggable backends complicates the story. If you build your Helix extension by linking against the jj-lib crate (or against the jj-cli crate), then you will only have access to the backends we bundle by default. However, there can be custom backends bundled into custom binaries, which is what we do at Google. Your extension would not be aware of those backends, which means that users at Google will not be able to use your Helix extension with our internal repo format. That's probably not be a concern for you right now unless you work for Google, but the same problem may arise with any future open-source backends.

We have discussed the second problem a bit on Discord (see link below) a few times. One solution is to make the jj binary able to start some server that you can perform operations on. The server could speak e.g. gRPC, or perhaps it would simply expect a request on stdin and produce an output on stdout (there are some advantages to having a longer-running server, though). That would give you access to any custom backends. Another option would be to enable dynamic loading of backends somehow.

Because of problem (1), I would recommend subprocessing to the binary for now, especially if you want to do any mutating operations. An alternative is that you start refactoring the jj-cli to move logic into the jj-lib crate, but that's probably quite a bit of work.

If you have any questions or channels on which rapid discussions is possible I can join them to help with that, if you have guidance to give I can try and help implement/document all that 😄

We have a Discord server: https://discord.gg/dkmfj3aGQN. There are typically around 75 people there these days.

Feb 13 '24 06:02 martinvonz

The support for pluggable backends complicates the story.

Yes you're right. We'll probably need a well defined way to communicate with the binary then, either as a server or a subprocess like LSPs do

I made https://github.com/helix-editor/helix/pull/9643, using the jj binary directly to handle any future or private backend correctly, it can help serve as a way to show it would work currently

Feb 16 '24 16:02 poliorcetics

I'd been wondering about the plans here. This seems related to the classic issue where it's hard to make changes to git's backend because not only do you have to update three implementations (git, libgit2, JGit), you also have to wait for Xcode and Visual Studio to update their libraries.

Mercurial has a command server for this reason (though over there, Python startup speed is pretty slow -- not an issue that happens with jj.) Dynamic loading of backends seems like the cleanest solution, assuming the interface between jj-lib and the backend is stabilized, but it's a lot of work. Maybe stabby makes it easier?

Feb 16 '24 19:02 sunshowers

Another option is similar to what sapling does. Sapling has an interactive smartlog which when run from the command line starts a web-based graphical interface, which is designed for use by humans.

But in addition, the interactive smartlog webserver also exposes some REST-style and websocket routes which are used by the editor plugins. For example, the vscode editor plugin starts isl and then communicates with it via HTTP (see e.g. DiffContextProvider.

Now, I don't know if jj plans any web-based interface for humans similar to the interactive smartlog, but there is a large overlap between functionality (at least on the backend side) with editor plugins and it might make sense to combine them.

Mar 15 '24 16:03 wuzzeb

I mentioned elsewhere that we could provide a --json flag for easily exporting data. I am in favour of having jj-lib having as good of a story as possible, but --json would be much more generally accessible, and likely a much lower bar.

Mar 15 '24 17:03 khionu

Oh, another thing to note, version differences between what is integrated and what is on the system could become a pain point. User machines are already a security hole, I think it's beyond the scope of these kinds of tools to ensure the user is actually getting jj ran and not something else in the same path. If they have a name conflict, they likely use enough tools where they can figure out how to resolve it.

Mar 15 '24 18:03 khionu

FWIW I've found that jj-lib operates at a useful level of abstraction to build a GUI client around; in particular, it's important to be able to recompose its primitives, rather than just executing a series of the same commands the CLI uses. Targeted higher-level additions would still be useful; there are definitely parts I've had to copy or reimplement.

In particular, gg's gui_util.rs duplicates a lot of code from jj-cli's cli_util, in order to remove TUI coupling; IDE integrations would have to do something similar.

Mar 16 '24 02:03 gulbanana

@gulbanana even if this is the case right now, if internal changes to how jj stores data are made and you're using an older version of gg with the latest jj you would run into issues. I think having a gRPC API or something like that which is kept stable is a lot easier in the long run than not being able to change internal structures. With the server approach, gg could just start a jj server process while it's running that exposes all operations you could do using the CLI in a structured way (e.g. for the log you could get all information available to the log template, not just the formatted string) but I don't think there'd be a problem with also exposing some lower primitives if it makes sense to do so.

Mar 31 '24 09:03 noahmayr

I'm not suggesting internal structures shouldn't change. Clients like gg have to keep up with breaking changes (unless it moves into the jj repo someday). However, I'm skeptical that any reasonable client-server protocol would be sufficiently high fidelity for arbitrary non-internal clients. Some specific concerns:

Server push is necessary; the client can't always tell what will change in order to encode that in RPC response types.
The client needs to be able to tell the server when it should or shouldn't snapshot, as well as merging oplog heads. This is needed for performance and for user-friendly safe concurrency.
The operations desirable in a GUI are neither a subset nor a superset of what's useful in a CLI. Most jj commands do "too much" from gg's point of view (which is important to allow CLI users to get a lot done in a little typing); some aren't capable enough because they don't have access to interactivity.
In a few cases, I've deliberately used different defaults due to the nature of each interface (example: in GG, your selection and your context-menu are distinct from the working copy, so I've made Backout create its inverse changes in the working copy rather than disruptively creating a new revision atop it).

In short- while I'm sure a daemon with an RPC interface would be useful, I don't see it as a replacement for what jj-lib can do when used as a crate. It could provide a rich interface tailored to some specific client, but probably at the cost of being a semantic bottleneck for others.

Mar 31 '24 14:03 gulbanana

@wuzzeb The issue for jj web is #2765 so it is planned.

To summarize a gRPC interface and a good library solve two very different goals and having one doesn't invalidate the use-case of the other. I assume that at some point jj will grow to have a daemon, so the embedders can integrate it in their preferred way.

As we're currently permanently unstable, I'd not worry about the "best" way forward as it is an issue for the eventual v1, we should accomodate both clients as best as possible.

Apr 01 '24 18:04 PhilipMetzger