jj
jj copied to clipboard
FR: Add an equivalent to `--first-parent`
Is your feature request related to a problem? Please describe.
A project I work on uses (mostly) merge commits to merge branches into main, where the shape of the commits in the individual branches is largely up to the developer working on that branch. When I create a new release, I want to get a list of merge commits and other commits that live directly on main.
Currently, in a collocated repo, I can do this with git log <prev tag>..main --first-parent. This has the semantics: "When finding commits to include, follow only the first parent commit upon seeing a merge commit." (from the git docs).
As I understand it, there is currently no equivalent in Jujutsu.
Describe the solution you'd like
I suggest a function like first_parents(srcs, domain), which returns all commits in domain that can be reached by a commit in srcs, following first-parent connections only.
For example, given this history:
o F
|
o E
|\
| o D
| |
| o C
| |
o | B
|/
o A
|
o root()
then:
first_parents(F, A::)would returnF | E | B | Afirst_parents(F, A..)would returnF | E | Bfirst_parents(D, A::)would returnD | C | A(because even though C is A's second child, A is C's first parent)first_parents(F | D, A::)would returnF | E | D | C | B | A(all commits are either on the first-commit branch or the branch that ends in D)first_parents(E, A::)would returnE | B | A(even thoughA::includes F, F is not a parent of E)
Describe alternatives you've considered
- A function like
first_parent(X)that returns only the first parent ofX. This might also be useful, but I don't think it's sufficient to get the same results as--first-parent, which explicitly follows those first-parent relationships recursively. - This could be an extension of
reachable(srcs, domain), i.e.reachable(srcs, domain, 1)would be roughly the equivalent of the above. However,reachablecurrently traverses all parent and child edges, which may be a bit unintuitive when trying to figure out which commits are reachable via first-parent relationships only. Andreachable(..., 1)looks weird: what does the1refer to here? - To make this slightly more generic, we could instead have a function like
nth_parents(srcs, domain[, N]), which would follow theNth parent rather than the first parent. The behaviour is difficult to specify for commits with a single parent though, and I suspect the only two interesting cases are "has a first-parent relationship" and "does not have a first-parent relationship", which are covered byfirst_parents(...)and the usual revset operations. - This could be a CLI flag like in Git, but that feels like the wrong approach given the revset language exists.
we already discussed having a ancestors(revset=<revset>, first_parent=) at quite length in the Discord. @lilyball also proposed adding a jj log --parent option recently and I think @jennings has a incomplete version of it on his fork.
I think making it a function instead of an optional argument is a good idea. But, I don't understand what effect the domain has because all the examples have the entire graph (minus root()) as the domain.
Would first_ancestors(X, Y)
be equivalent to first_ancestors(X, all()) & Y?
I like the flag because ancestors() also supports a depth argument, which can still be useful (e.g. ancestors(X, 2, first_parents=true) to access a single first parent). So ancestors(x[, depth[, first_parents=]]) seems like the ideal to me, but a first_ancestors() alias would be nice too.
I agree that the predicate form is only good if it actually lets us express something more.
I agree, I believe first_parents(X, Y) (from the issue) is strictly equivalent to ancestors(X, first_parents=true) & Y, in which case adding the flag to ancestors is the better option.
Well darn, because I think you've changed my mind the other way :)
However, instead of first_ancestors(head, domain) that produces revisions the same way ancestors(head) does, I think maybe it should filter down an already-selected revset. This way, you can use it with A..B too.
The expression first_parents(R) would:
- Always include
heads(R) - Starting from those heads, walk the tree by following only first parents
- Stop when reaching a revision outside of
R
Given this graph:
o F
|
o E
|\
| o D
| |
| o C
| |
o | B
|/
o A
|
o root()
Then:
| revset | result |
|---|---|
first_parents(all()) |
A B E F root() |
first_parents(A::F) |
A B E F |
first_parents(A::D) |
A C D |
first_parents(A::(D|E)) |
A B C D |
I like that this can be used to wrap any revset, which will make it easier to approach to people coming from Git's --first-parents flag. I also like that it's a separate function rather than a boolean flag passed to a different function.
first_parents(ancestors(X)) would be equivalent to the other proposed option ancestors(X, first_parent=true), right? So there shouldn't be any loss of generality or power here.
I think this is dup of https://github.com/jj-vcs/jj/issues/4579. The proposed solution is a bit different, but they'll be desugared to the same/similar internal expression. I personally think first_ancestors(heads) or ancestors(heads[, first..]) is easier to understand than first_parents(domain). The latter sounds like a function to return the first parents of the given revisions.
This is definitely related to #4579, but I think there are two specific (but distinct) functions that are useful in different circumstances:
- For a commit or collection of commits, return the first parent of that commit/each commit. This is what was originally suggested in #4579.
- For a collection of commits, filter out commits that cannot be reached via a first-parent relationship from some known commit. This was also discussed in the comments of #4579, and is what I'm suggesting specifically in this issue.
As far as I can tell, neither of these can be built on top of the other: there is no function magic_first_parents(...) that can do both tasks (outside of having a special flag that switches the implementation around). So if we want to have both pieces of functionality, I think we need two functions. (And I agree with you that first_parent should probably refer to a function that only returns the first parent, not a function that might return arbitrary ancestors — I think @emilazy already pointed out in Discord that ancestors is the better terminology here.)
I think it's useful for both issues to stay open to cover these two different cases, but given they are somewhat similar, I'd also be happy if they got merged.
@jennings: how would you handle a case where I want to find all commits in X that are have a first-parent relationship to commit Y, where Y is not in X?
For example, using your diagram, I want to find all commits in root()..(B|D) that are first-parents of F: how would I do this?
Follow up question: is this ever a useful thing to want to do in the first place, i.e. can we just ignore this and stick with the simpler syntax?
Yeah, we'll definitely need separate functions at UI level, first_parents()/first_ancestors() or something. They'll be translated to the same ancestors() functions internally. parents(x) is a sugar for (internal) ancestors(x, depth=1..=1).
@jennings: how would you handle a case where I want to find all commits in
Xthat are have a first-parent relationship to commitY, whereYis not inX?For example, using your diagram, I want to find all commits in
root()..(B|D)that are first-parents ofF: how would I do this?
first_parents(..F) & ..(B|D)
Follow up question: is this ever a useful thing to want to do in the first place, i.e. can we just ignore this and stick with the simpler syntax?
Not as far as I know. I think "first parents" is useful in git because you're trying to follow a branch's history without seeing all the minutia that happened in the branches that were merged in. That is, you want to see all the "Merged my-feature into main" commits but not all the commits that were in my-feature.
So, I think you usually know the lineage you want to see, and then you want to filter it a bit. That's essentially what git log --first-parents does.
I think @yuja's first_ancestors(heads) is the best option to implement, because I believe all of the other discussed options can be defined as aliases using first_ancestors(heads) as well:
first_parents(domain)=first_ancestors(heads(domain)) & domainfirst_parents(heads, domain)=first_ancestors(heads) & domain
It also seems like the most efficient operation to implement, since it closely matches how revsets are implemented. For instance, I believe implementing first_parents(domain) would require adding an implicit heads() operation before evaluating the revset, which seems less efficient.
However, I think the naming is a bit confusing. In my mind, "first ancestors" sounds like it should mean "parents" (i.e. ancestors at depth 1). I'm not sure if any of these options are clearer though:
first_ancestors(heads)(original suggestion)first_parents(heads)first_parent_ancestors(heads)ancestors(heads, first=true)ancestors(heads, parent=1)ancestors(heads, nth_parent=1)ancestors(heads, nth=1)
In my draft PR (#6877) I used first_parents(heads) because I thought it reads the best (e.g. jj log -r 'first_parents(main)' is equivalent to git log --first-parent main). I'd be happy to change it though if other people prefer a different naming.
jj log -r 'ancestors(main)' is like git log main, so I expect jj log -r 'first_ancestors(main)' to be like git log --first-parent main. I am no longer sure if that is @yuja’s first_ancestors() or not.
(The difference is that --first-parent is describing the choice of what commit(s) to follow at each step, i.e. parents(…) vs. first_parents(…), whereas ancestors(…) does all the following for you.)
Just to be clear, I think first_parents(heads) is bad name. I expect it would return the first parent of the heads commits.
Some other naming suggestion:
ancestors_follow_first(heads)ancestors(heads, follow=first)
I avoided =<bool> and =<int> because we don't have any other functions taking =<bool>, and we have 0-based integer parameters.
This is a difficult naming problem. I'll add to Yuya's suggestions, though I don't feel like mine are that much better. OTOH, perhaps they will spark more ideas.
ancestors_follow_first_parentmight be clearer thanancestors_follow_first- We could have
nth_parent(x, 1)andnth_parent_repeated(x, 1), possibly with aliasesfirst_parentandfirst_parent_repeated.
At the moment, first_parent_repeated seems at least as good as ancestors_follow_first to me, as long as people don't imagine it as the same first parent commit repeated many times (which one would never want).
It might also make sense to introduce =<bool> just for this, ancestors(heads, follow_first=true) seems pretty clear. follow=first might be harder to remember, and there might be other use-cases for =<bool>
But I kind of wish we'd have a shorter syntax for =<bool>. Perhaps ancestors(heads, #follow_first) could work? $ is another symbol. Perhaps we could use :, since it's already used for glob:.
Overall, whether we want something like =<bool> feels like a larger issue that we might want to leave for later.
(Both this and the previous message are mostly a stream of thoughts)
Looking back at what I wrote, if we want a "minimal viable PR" for just first parents, the names that all seem OK if not amazing to me are:
- ancestors_follow_first
- ancestors_follow_first_parent
- first_parent_repeated
If we went with something like ancestors(heads, follow=first), I wonder whether introducing a Lisp-like symbol type and writing it as ancestors(heads, follow=:first) would be better (possibly with #$% or some other symbol). And if that's what we do, we might as well do ancestors(heads, :follow_first).
Update: I looked through the docs, and it seems the closest precedent we currently have to ancestors(heads, follow=first) is remote_bookmarks(remote=origin) where origin is a "string pattern".
Another weird syntax for "symbols" that comes to mind is follow_first:, which would make follow_first more analogous to glob from a glob pattern. But perhaps %follow_first would be better.
Instead of it being a special "symbol", %follow_first could also be a shorthand for follow_first=true, which has the advantage of leaving an easier-to-remember option, and the disadvantage of having two ways to do the same thing.
Since symbols and strings are usually the same thing in revset, I don't think we'll need another syntax for keyword-like symbol. (It's the same for true/false, btw.)
Is this revset useful for anything outside of log context? (I don't actually understand what it returns or why you would want it.)
If it is only log, maybe it should be a parameter, like Git has.
Else maybe the other scenarios could suggest a better name (that conveys what it is for).
Are there descriptive words in the graph context, or the ancestor context? Maternal? Lineage? Ordinal? Leftmost? (obviously, I lack the concept being proposed, help me see it)
Is this revset useful for anything outside of
logcontext? (I don't actually understand what it returns or why you would want it.) If it is onlylog, maybe it should be a parameter, like Git has. Else maybe the other scenarios could suggest a better name (that conveys what it is for).
I think it could still be useful in other contexts. For instance, with a future bisect command, it could be useful to only check PR merge commits.
Are there descriptive words in the graph context, or the ancestor context? Maternal? Lineage? Ordinal? Leftmost? (obviously, I lack the concept being proposed, help me see it)
Maybe mainline_ancestors(heads) could work? I think Git uses "mainline" to describe this in other contexts sometimes.
It is also useful for bisection (git bisect --first-parent). If you use PR merge commits, then git log --first-parent is precisely the log of PRs that were merged, omitting the commits inside them. This is very useful for repositories that merge large numbers of PRs with varying base commits, like Nixpkgs.
(FWIW I haven’t heard anything I like more than first_ancestors().)
If you use PR merge commits, then
git log --first-parentis precisely the log of PRs that were merged, omitting the commits inside them. This is very useful for repositories that merge large numbers of PRs with varying base commits, like Nixpkgs.
This pretty much is the main use-case in both FRs (this and #4579).
(FWIW I haven’t heard anything I like more than
first_ancestors().)
👍