jj icon indicating copy to clipboard operation
jj copied to clipboard

`jj file annotate` significantly slower than `git blame`

Open xdBronch opened this issue 1 year ago • 4 comments

Steps to Reproduce the Problem

take a reasonably large git repo and run git blame/jj file annotate, for example i used lib/src/revset.rs in the jj repo since its a fairly big file. jj is 5-10x slower

$ time git blame lib/src/revset.rs > /dev/null
real    0m 0.30s
user    0m 0.21s
sys     0m 0.09s
$ time jj file annotate lib/src/revset.rs > /dev/null
real    0m 1.60s
user    0m 1.38s
sys     0m 0.23s

Expected Behavior

jj should perform better here

Specifications

  • Platform: x86-64 alpine linux 6.11.5
  • Version: e4fdc91b690379716c73190647170f702684ee6e

xdBronch avatar Nov 15 '24 18:11 xdBronch

Mabe it's the same root cause as #4674

martinvonz avatar Nov 15 '24 18:11 martinvonz

I can confirm. The slowness in jj log FILE seems to come from the following call stack:

jj_lib::default_index::revset_engine::build_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::rev_walk::FilterMapRevWalk<W,F> as jj_lib::default_index::rev_walk::RevWalk<I>>::next [jj]
jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk::new_edges_from_internal_commit [jj]
<jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk as jj_lib::default_index::rev_walk::RevWalk<jj_lib::default_index::composite::CompositeIndex>>::next [jj]
<core::iter::sources::from_fn::FromFn<F> as core::iter::traits::iterator::Iterator>::next [jj]
jj_lib::graph::TopoGroupedGraphIterator<N,I>::populate_one [jj]
<jj_lib::graph::TopoGroupedGraphIterator<N,I> as core::iter::traits::iterator::Iterator>::next [jj]
jj_cli::commands::log::cmd_log [jj]
jj_cli::commands::run_command [jj]

and the slowness in jj file annotate FILE seems to be similar:

jj_lib::default_index::revset_engine::build_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::revset_engine::FilterRevset<S,P> as jj_lib::default_index::revset_engine::ToPredicateFn>::to_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::revset_engine::UnionRevset<S1,S2> as jj_lib::default_index::revset_engine::ToPredicateFn>::to_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::rev_walk::FilterMapRevWalk<W,F> as jj_lib::default_index::rev_walk::RevWalk<I>>::next [jj]
jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk::new_edges_from_internal_commit [jj]
<jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk as jj_lib::default_index::rev_walk::RevWalk<jj_lib::default_index::composite::CompositeIndex>>::next [jj]
<core::iter::sources::from_fn::FromFn<F> as core::iter::traits::iterator::Iterator>::next [jj]
jj_lib::annotate::compute_file_annotation [jj]
jj_lib::annotate::get_annotation_for_file [jj]
jj_cli::commands::run_command [jj]

Unlike jj log FILE, jj file annotate FILE doesn't stream its output to stdout. Perhaps it could print the first line of the file once it determines the source change for the first line of the file (and for the second line, and so on)?

mlcui-corp avatar Dec 11 '24 00:12 mlcui-corp

Unlike jj log FILE, jj file annotate FILE doesn't stream its output to stdout. Perhaps it could print the first line of the file once it determines the source change for the first line of the file (and for the second line, and so on)?

Technically it can, but I assume it wouldn't be worth the complexity. The first line is often old (e.g. shebang, copyright comment, etc.) Maybe we can instead add a flag to limit the search range or TUI that updates annotation incrementally.

yuja avatar Dec 11 '24 01:12 yuja

While having an index would certainly be helpful, I suspect there are other problems lurking. For example, this should really be effectively instant in any repository, no matter how large:

$ jj new && echo 'Line 1' > ANNOTATE-TEST && jj new && echo 'Line 2' >> ANNOTATE-TEST && jj st
$ time jj --ignore-working-copy file annotate -T 'commit.commit_id() ++ "\n"' ANNOTATE-TEST

Only the two most recent commits should be traversed, but every commit in ..@ gets visited, so it scales with repo size. I tried to investigate, but this appears to be far too complex for someone who's new to the codebase and not familiar with how the graph traversal works; the furthest I could go is that the final call to RevsetGraphWalk::try_next(), after deciding on the entry to be returned, calls remove_transitive_edges(), which eventually ends up calling consume_to() which tests every commit with the revset filter (a fairly costly operation for RevsetFilterPredicate::File).

This appears to affect any file that contains at least one line that traces back to its first version (not unusual for a closing bracket or import/use/etc. or a copyright notice). If the file is reasonably recent (i.e. shouldn't need a deep traversal), the time added can be significant.

mernen avatar Mar 25 '25 22:03 mernen