Group (topological) branches in log output

Open martinvonz opened this issue 3 years ago • 1 comments

Description

Parallel branches are interleaved in the (ASCII-)graphical log output, making it hard to read.

Actual Behavior

o 2e4dc019d943 e8c23f0ee5e2 [email protected] 2021-10-20 14:21:01.000 -07:00
: git: don't update public heads for now
: o 77b94b593d68 bb3b69138188 [email protected] 2021-10-13 12:29:47.000 -07:00
: | git: move logic for adding remote to git module
: | o cfd9255b94d3 c48b95a17bdf [email protected] 2021-10-13 16:17:40.000 -07:00
: |/  index: start indexing file-level conflicts
:/|
o | ca796b49f06f b4323a1a35b6 [email protected] 2021-10-13 20:38:39.000 -07:00
:/  docs: describe how to set up commmand-line completion
o c0a26f76426e 543967befc73 [email protected] 2021-10-13 11:08:01.000 -07:00

Expected Behavior

It's much easier to read like this:

o 2e4dc019d943 e8c23f0ee5e2 [email protected] 2021-10-20 14:21:01.000 -07:00
: git: don't update public heads for now
: o cfd9255b94d3 c48b95a17bdf [email protected] 2021-10-13 16:17:40.000 -07:00
:/  index: start indexing file-level conflicts
o ca796b49f06f b4323a1a35b6 [email protected] 2021-10-13 20:38:39.000 -07:00
: docs: describe how to set up commmand-line completion
: o 77b94b593d68 bb3b69138188 [email protected] 2021-10-13 12:29:47.000 -07:00
:/ git: move logic for adding remote to git module
o c0a26f76426e 543967befc73 [email protected] 2021-10-13 11:08:01.000 -07:00

I think the idea is to print all commits on one side of a fork point before the other side(s) of the fork point. The fork points in this graph are ca796b49f06f and c0a26f76426e, although only the latter matters in this case (because there's only one commit on each side before ca796b49f06f)

Specifications

Platform: All
Version: 0.4.0

Apr 25 '22 16:04 martinvonz

Here's a description of how Git does it: https://github.blog/2022-08-30-gits-database-internals-ii-commit-history-queries/#topological-sorting

Sep 09 '22 03:09 martinvonz

Sorry, haven't read through that Github blog in detail so maybe I have this wrong but it seems to me like a simple DFS like approach would work well (enough) here? I personally find this much easier to read --

o c0a26f76426e 543967befc73 [email protected] 2021-10-13 11:08:01.000 -07:00
|
|...o ca796b49f06f b4323a1a35b6 [email protected] 2021-10-13 20:38:39.000 -07:00
|   | docs: describe how to set up commmand-line completion
|   |
|   |---o cfd9255b94d3 c48b95a17bdf [email protected] 2021-10-13 16:17:40.000 -07:00
|   |     index: start indexing file-level conflicts
|   |
|   |...o 2e4dc019d943 e8c23f0ee5e2 [email protected] 2021-10-20 14:21:01.000 -07:00
|         git: don't update public heads for now
|
>---@ 77b94b593d68 bb3b69138188 [email protected] 2021-10-13 12:29:47.000 -07:00
      git: move logic for adding remote to git module

I originally did not differentiate between children vs descendants but then later shoehorned it with --- vs ..., I personally don't have much of a use for that differentiation but I'm not sure if people do.

Also, I'm not really sure how jj handles those relations in code but in the DFS, we could just prefer to render most recently touched subtrees towards the end (or the beginning, if you don't reverse the output), with the working copy getting special preference to be considered most recently touched (and hence get rendered at the last for better visibility). Thoughts?

Jan 14 '23 12:01 avamsi

Just to make the above log more compact (like jj log now),

o c0a26f76426e 543967befc73 [email protected] 2021-10-13 11:08:01.000 -07:00
|...o ca796b49f06f b4323a1a35b6 [email protected] 2021-10-13 20:38:39.000 -07:00
|   | docs: describe how to set up commmand-line completion
|   |---o cfd9255b94d3 c48b95a17bdf [email protected] 2021-10-13 16:17:40.000 -07:00
|   |     index: start indexing file-level conflicts
|   |...o 2e4dc019d943 e8c23f0ee5e2 [email protected] 2021-10-20 14:21:01.000 -07:00
|         git: don't update public heads for now
>---@ 77b94b593d68 bb3b69138188 [email protected] 2021-10-13 12:29:47.000 -07:00
      git: move logic for adding remote to git module

Jan 14 '23 12:01 avamsi

The code is in revset_graph_iterator.rs. What makes it complicated is the laziness - we shouldn't walk the whole graph before we start displaying output. That would be fine to do in small repos, but it would make jj log unusably slow on very large repos. That's also the reason Git uses that algorithm they describe there.

Jan 14 '23 14:01 martinvonz

Segmented changelog (or similar) might help the scalability issue? https://raw.githubusercontent.com/facebook/sapling/main/eden/scm/slides/201904-segmented-changelog/segmented-changelog.pdf

Just for reference. I don't know if it applies here, but it contains some readable graph examples saying "use depth-first search" to assign numbers.

Jan 14 '23 15:01 yuja

Segmented changelog (or similar) might help the scalability issue?

Hmm, I think that's a good point whether or not we use segmented changelog. Unlike Git [^1], we could quite efficiently find common ancestors using the commit index. Maybe that would perform well enough and be simpler, but I haven't thought much about it.

[^1]: Before they added their commit graph anyway, which came long after their algorithm for ordering commits nicely in the graph, I think.

Jan 15 '23 17:01 martinvonz

It sounds like this function might be useful. It's used by smartlog rendering (via the sort(..., topo) revset).

It's a bit more aggressive than the old sort(topo) implementation: it sorts branches forking off a same commit by length, and sorts branches that merge too. It might be a bit more flexible - if you want to sort branches by date, then it is probably doable via priorities.

Jan 17 '23 09:01 quark-zju