`jj file annotate` significantly slower than `git blame`
Steps to Reproduce the Problem
take a reasonably large git repo and run git blame/jj file annotate, for example i used lib/src/revset.rs in the jj repo since its a fairly big file. jj is 5-10x slower
$ time git blame lib/src/revset.rs > /dev/null
real 0m 0.30s
user 0m 0.21s
sys 0m 0.09s
$ time jj file annotate lib/src/revset.rs > /dev/null
real 0m 1.60s
user 0m 1.38s
sys 0m 0.23s
Expected Behavior
jj should perform better here
Specifications
- Platform: x86-64 alpine linux 6.11.5
- Version: e4fdc91b690379716c73190647170f702684ee6e
Mabe it's the same root cause as #4674
I can confirm. The slowness in jj log FILE seems to come from the following call stack:
jj_lib::default_index::revset_engine::build_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::rev_walk::FilterMapRevWalk<W,F> as jj_lib::default_index::rev_walk::RevWalk<I>>::next [jj]
jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk::new_edges_from_internal_commit [jj]
<jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk as jj_lib::default_index::rev_walk::RevWalk<jj_lib::default_index::composite::CompositeIndex>>::next [jj]
<core::iter::sources::from_fn::FromFn<F> as core::iter::traits::iterator::Iterator>::next [jj]
jj_lib::graph::TopoGroupedGraphIterator<N,I>::populate_one [jj]
<jj_lib::graph::TopoGroupedGraphIterator<N,I> as core::iter::traits::iterator::Iterator>::next [jj]
jj_cli::commands::log::cmd_log [jj]
jj_cli::commands::run_command [jj]
and the slowness in jj file annotate FILE seems to be similar:
jj_lib::default_index::revset_engine::build_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::revset_engine::FilterRevset<S,P> as jj_lib::default_index::revset_engine::ToPredicateFn>::to_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::revset_engine::UnionRevset<S1,S2> as jj_lib::default_index::revset_engine::ToPredicateFn>::to_predicate_fn::{{closure}} [jj]
<jj_lib::default_index::rev_walk::FilterMapRevWalk<W,F> as jj_lib::default_index::rev_walk::RevWalk<I>>::next [jj]
jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk::new_edges_from_internal_commit [jj]
<jj_lib::default_index::revset_graph_iterator::RevsetGraphWalk as jj_lib::default_index::rev_walk::RevWalk<jj_lib::default_index::composite::CompositeIndex>>::next [jj]
<core::iter::sources::from_fn::FromFn<F> as core::iter::traits::iterator::Iterator>::next [jj]
jj_lib::annotate::compute_file_annotation [jj]
jj_lib::annotate::get_annotation_for_file [jj]
jj_cli::commands::run_command [jj]
Unlike jj log FILE, jj file annotate FILE doesn't stream its output to stdout. Perhaps it could print the first line of the file once it determines the source change for the first line of the file (and for the second line, and so on)?
Unlike
jj log FILE,jj file annotate FILEdoesn't stream its output to stdout. Perhaps it could print the first line of the file once it determines the source change for the first line of the file (and for the second line, and so on)?
Technically it can, but I assume it wouldn't be worth the complexity. The first line is often old (e.g. shebang, copyright comment, etc.) Maybe we can instead add a flag to limit the search range or TUI that updates annotation incrementally.
While having an index would certainly be helpful, I suspect there are other problems lurking. For example, this should really be effectively instant in any repository, no matter how large:
$ jj new && echo 'Line 1' > ANNOTATE-TEST && jj new && echo 'Line 2' >> ANNOTATE-TEST && jj st
$ time jj --ignore-working-copy file annotate -T 'commit.commit_id() ++ "\n"' ANNOTATE-TEST
Only the two most recent commits should be traversed, but every commit in ..@ gets visited, so it scales with repo size. I tried to investigate, but this appears to be far too complex for someone who's new to the codebase and not familiar with how the graph traversal works; the furthest I could go is that the final call to RevsetGraphWalk::try_next(), after deciding on the entry to be returned, calls remove_transitive_edges(), which eventually ends up calling consume_to() which tests every commit with the revset filter (a fairly costly operation for RevsetFilterPredicate::File).
This appears to affect any file that contains at least one line that traces back to its first version (not unusual for a closing bracket or import/use/etc. or a copyright notice). If the file is reasonably recent (i.e. shouldn't need a deep traversal), the time added can be significant.