jj icon indicating copy to clipboard operation
jj copied to clipboard

mailmap: add support for `.mailmap` files

Open emilazy opened this issue 1 year ago • 18 comments

Explanation, motivation, backstory, and design decisions are in the commit message, and I’ve included comprehensive tests and hopefully‐adequate documentation.

Pulling in the people who actively discussed this on Discord for review, hope that’s okay: @nasamuffin @arxanas @PhilipMetzger

Some things I’m unsure about and could use feedback on:

  • I’m happy to hear bikeshedding about the DSL interfaces here, especially the *_raw names (*_unmapped?).

  • There is a breaking change to the Rust .author() and .committer() methods on Commit; I assume this is okay due to the current level of general churn, but I thought I should point it out. I plan to look into opening a PR to make gg do the right thing here.

  • ~~get_wc_commit_mailmap isn’t exactly my favourite function name in the world, and it feels a little gross to have commit/tree‐related plumbing in the mailmap module (e.g., jj_lib::gitignore just has a pure model). Any suggestions for better places this could go, better names it could have, types it could hang off as a method of, etc. are welcome.~~

  • ~~I added pub(crate) to jj_lib::git_backend: signature_{from,to}_git, as they’re used by the mailmap module to interface with gix-mailmap. Since this isn’t a feature specific to the Git backend, it feels wrong to have that dependency, but I’m not sure where would be good to move them to. A new jj_lib::git_util module, perhaps?~~

  • ~~I don’t love the jj_cmd_ok_as thing in the CLI test, but it seemed like too much work to refactor that interface right now.~~

  • I don’t really know how conflicts in the .mailmap file should be handled. Ideally the entries not involved in a conflict should still be respected, but I’m not even sure how materialize_tree_value behaves here.

  • ~~Should I split this into multiple commits? A lot of the changes are fairly entangled, but I could probably tease one or two more commits out of this.~~

  • Should there be more permanent documentation of the .mailmap support, and if so, where?

Checklist

If applicable:

  • [x] I have updated CHANGELOG.md
  • [x] I have updated the documentation (README.md, docs/, demos/)
  • [x] I have updated the config schema (cli/src/config-schema.json)
  • [x] I have added tests to cover my changes

emilazy avatar Jun 24 '24 12:06 emilazy

Looks like these ended up being immediately relevant to https://github.com/martinvonz/jj/actions/runs/9644975580/job/26598226134?pr=3951:

That said, this is not exclusive to the Git backend. The .mailmap name and format is perfectly generic, already shared between Git and Mercurial, and applies to all systems that bake names and emails into commits, including the current local backend. The code uses Gitoxide, but only as a convenient implementation of the file format; in a hypothetical world where the Git backend was removed without Jujutsu changing its notion of commit signatures, gix-mailmap could be used standalone, or replaced with a bespoke implementation.

I added pub(crate) to jj_lib::git_backend: signature_{from,to}_git, as they’re used by the mailmap module to interface with gix-mailmap. Since this isn’t a feature specific to the Git backend, it feels wrong to have that dependency, but I’m not sure where would be good to move them to. A new jj_lib::git_util module, perhaps?

I can add a separate direct gix_mailmap dependency, rather than adding it to the main gix dependency, to signal that it’s not being depended on as part of the Git backend and make the check happy. Suggestions for where to move the signature conversion functions would be appreciated though!

emilazy avatar Jun 24 '24 12:06 emilazy

I factored out the conversion functions into a new private jj_lib::signature_util module (bikeshedding on the name etc. welcome), and hopefully the tests will run now. As I explained in the commit message this isn’t specific to the Git backend (gix-mailmap is just an implementation detail), so nothing there is conditional.

emilazy avatar Jun 24 '24 16:06 emilazy

For what it’s worth, the best alternative DSL API I came up with was commit.{author,committer}([canonical=true]) and {author,committer}{pattern, [canonical=true]), but that required adding support for keyword arguments to the template DSL and didn’t seem obviously better than what I ended up with currently.

emilazy avatar Jun 24 '24 16:06 emilazy

(Gah, messed up when splitting up commits; force‐pushed again.)

emilazy avatar Jun 24 '24 16:06 emilazy

Okay, I’ve dropped the canonicalizing functions from Commit and added corresponding convenience helpers on the Mailmap type instead, along with explanatory documentation on the non‐canonicalizing Commit methods, and pointers for obtaining a Mailmap. That seems better‐factored and should probably be sufficient to nudge tooling developers onto the right path. Thank you both for the feedback; hopefully this is better from a code organization POV.

I also separated out the signature conversion, and renamed get_wc_commit_mailmap to get_current_mailmap, which I dislike a lot less. It turns out that Git actually does support empty email addresses, and that real repositories (e.g. Nixpkgs) have such commits in them; it would be a practical use of .mailmap to correct these, e.g. <[email protected]> Author Name <>, so rather than bailing early on empty signatures I just convert them to Gitoxide’s types as‐is. It’s fewer Git‐specific details in backend‐agnostic code, less churn in the rest of the codebase, and overall better behaviour, so it turned out nicely.

I’ve currently left the {author,committer}{,_raw} revset and template functions rather than moving them to a global option. Git has e.g. %an/%aN pretty format pairs which are analogous to this; I assume they only have the option because it historically used to be opt‐in and they don’t have a real query language. It feels more flexible and declarative to encode the intent in the function names rather than having their semantics depend on more global state. But if people feel strongly about just having a global option to disable it I’m fine implementing that instead.

emilazy avatar Jun 25 '24 14:06 emilazy

Thank you for taking a look!

GitHub won't let me comment on the commit message :) but in my opinion, you could say a little more - that the mailmap will introduce a concept of pre- and post-mapped identities. Right now, it's implied here.

That’s not something I hear often :)

I think this is a good idea and I’ve added another few paragraphs (after my editor ate the first draft…); I’ve called them raw signatures and canonical signatures, as that’s the best terminology I can think of currently and it matches the code and comments. I’ve also tweaked a few doc comments to allude to these concept more explicitly, although there isn’t an in‐depth exposition in the code currently. Possibly this should be recorded more permanently in developer documentation somewhere but hopefully that can be left to a follow‐up PR – the design docs section, perhaps? Although I wouldn’t exactly say that this is a stellar piece of de novo design to flaunt, just a pragmatic stop‐gap for arguably‐regrettable decisions the majority of DVCSes have made up to this point…

Are committer and author the only types of identities that are here? What happens to, say, commit message footers like Signed-off-by: (which are mailmapped in Git, IIRC)? Is that handled differently?

As far as I can tell authors and committers are the only places Jujutsu gets and processes signatures from currently, and there is no special support for trailers. It seems like git shortlog --group=… canonicalizes identity information in trailers, but git log doesn’t on display; a strange state of affairs that speaks to the unfortunate whack‐a‐mole you talked about and that I’m trying to avoid repeating. I guess it makes some sense with the nature of trailers as an ad‐hoc interpretation layer on top of unstructured commit messages, which I also think is regrettable. Hopefully Jujutsu can at least do better there, even if it has to lower whatever additional metadata it ends up recording down to trailers for now for Git compatibility! I’ve discussed this in the commit message too as another example of where the raw–canonical signature distinction and guidelines for which to use would be relevant.

emilazy avatar Jun 25 '24 18:06 emilazy

I’ve pushed a pretty big rework of this; it feels cleaner now. Headline changes:

  • Added a Mailmap::empty constructor to use instead of Default::default for clarity.

  • Renamed mailmap::{get → read}_current_mailmap and added error reporting.

  • Added a WorkspaceCommandHelper::current_mailmap convenience helper, although it currently only has one call site. (Maybe it should be read_current_mailmap too? Maybe it should keep the async?) Caching is a TODO for a later date, but performance impact seems negligible anyway.

  • The details of handling .mailmap mappings are now dealt with entirely in the revset frontend; backends don’t have to care about it at all. The same correctness guarantees as the previous version are maintained, and the resulting expressions should be efficient to execute against database indexes. Everything is abstracted through simple functions like RevsetExpression::author; the mapping implementation is contained entirely within the private RevsetExpression::signature_field helper, which I have commented extensively to make the logic clear.

  • Added even more paragraphs of detail to the main commit message!

I believe I’ve addressed all the previous review comments; hopefully this is acceptable to everyone now and we can get this merged! This turned into a bigger project than I expected when I picked it as my first public contribution to Jujutsu, but I’m grateful for all the feedback and I think this has ended up in a good place.

emilazy avatar Jul 10 '24 07:07 emilazy

What is the status here? It'd be a shame to let it bit rot further.

mati865 avatar Sep 29 '24 23:09 mati865

Sorry for the inactivity; I was expecting this to be a relatively quick PR and burnt out a little on the review cycles. I still intend to rebase this and respond to review comments at some point, though I am not sure if it will be possible to fully satisfy everyone here.

emilazy avatar Sep 30 '24 22:09 emilazy

I would really love to see this get merged as long as it's in a "good enough" state even if it's not perfect. There are repositories I don't want to use jujutsu with until it has mailmap support because I'd rather not see my deadname show up in the UI.

lilyball avatar Sep 30 '24 23:09 lilyball

I'd rather not see my deadname show up in the UI.

FWIW, the display part (= templater + Commit object methods) is easy, and can be merged separately. IIRC, most of the discussion here is about author()/committer() revset.

yuja avatar Oct 01 '24 01:10 yuja

Is there a plan to (partially) merge this, in time for the v0.26.0 release?

zx8 avatar Jan 29 '25 18:01 zx8

Is there a plan to (partially) merge this, in time for the v0.26.0 release?

@zx8 I don't think anyone is working on this, so I doubt it. This seems to have been last updated in ~September 2024, and it currently has 10+ conflicted files. You might be able to duplicate the PR and resume working on it, if you're interested in implementing it?

arxanas avatar Jan 29 '25 20:01 arxanas

@arxanas Sorry, I missed how stale this PR had become prior to commenting. Unfortunately, I don’t know Rust well enough to pick up a change of this magnitude.

zx8 avatar Jan 29 '25 20:01 zx8

(Accidentally hit the close button, whoops!)

I have local work on this that makes it a lot structurally cleaner and should address the design comments raised long, long ago and hopefully satisfy all parties. I still intend to pick this back up and get it merged at some point, it’s just a fair amount of work to get it completely ready and I want to make sure it’s polished enough to not have to go through another bunch of review cycles :)

emilazy avatar Jan 29 '25 22:01 emilazy

Just FYI, today I finished rebasing this work onto the 0.28.2 release commit over in my fork. I don't think I did it perfectly (certainly not up to the kind of quality standard where I'd submit a PR for it), but I think it's correct. For anyone who's been wanting to update to a newer version of jj, hopefully this will tide you over 😄

https://github.com/maddiemort/jj/tree/updated-mailmap

(for me, this was a jump all the way from 0.20 to 0.28!)

maddiemort avatar Apr 22 '25 20:04 maddiemort

Thanks! I’m sorry I’ve neglected this PR for so long – I hope I’ll have time over the next couple months to rebase and push out my rewritten version that should have a much better chance of landing in trunk.

emilazy avatar Apr 22 '25 20:04 emilazy

Don't worry at all! When you do come back to this, you may find the work on my branch helpful - the four commits that retain you as author don't contain changes other than your work plus conflict resolution, so you might be able to pick them up wholesale for your own rebase.

Edit: oh, you said you rewrote it - so maybe not 😄

maddiemort avatar Apr 22 '25 20:04 maddiemort