unison
unison copied to clipboard
delete.term is slow for large codebases
I haven't provided a sample codebase or stats to back up this claim, so feel free to push back on that. But here's my experience:
In my codebase when I call delete.term
it takes 10+ seconds. It then presents output that looks something like this:
Name changes:
Original Changes
1. myproject.trunk.Foo.Config.node ┐ 2. pr.myproject.conffile3.Foo.Config.node (removed)
3. pr.myproject.conffile.Foo.Config.node │
4. pr.myproject.conffile2.Foo.Config.node │
5. pr.myproject.conffile3.Foo.Config.node │
6. pr.myproject.debugthing.Foo.Config.node │
7. pr.myproject.fixscratch.Foo.Config.node │
8. pr.myproject.integrationtests.Foo.Config.node │
9. pr.myproject.updatehttp5.Foo.Config.node │
10. tmp.debugthing.Foo.Config.node │
11. tmp.testing.Foo.Config.node │
12. tmp.testing2.Foo.Config.node ┘
Tip: You can use `undo` or `reflog` to undo this change.
I can see that these name changes aren't scoped to my current namespace. It seems like with all of the recent changes to keep things scoped to the current namespace, maybe they should be? Which maybe means that delete.term
should only allow you to delete terms from the current namespace? You could potentially add a delete.term.global
, but I'm not sure how commonly that would be used.
I'm also curious what makes this so slow. Is it mostly checking for dependencies? Is it checking for dependencies in the global scope instead of just the current namespace? Is most of the time spent using a global pretty-printer to print the name changes?
I'd imagine that the same applies to delete.type
and maybe delete.namespace
.
It's really the diff.namespace
computation that's slow, I believe. You can confirm this by doing diff.namespace
on the two reflog entries.
It's really weird that we show all the matching global names even if none of them changed; e.g. I did a delete.term reverse
from within .public.ansi.trunk
and got this kinda useless output:
Name changes:
Original Changes
1. public.ansi.trunk.reverse ┐ 2. public.ansi.trunk.reverse (removed)
3. public.demo.trunk.lib.nimbus.lib.ansi.reverse │
4. public.nimbus.pr.tsocket.lib.ansi.reverse │
5. public.nimbus.trunk.lib.ansi.reverse │
6. public.projects.ansi.latest.reverse │
7. public.projects.ansi.main.reverse │
8. public.projects.ansi.releases.v1.reverse │
9. public.share.stew.public.projects.ansi.latest.reverse │
10. public.share.stew.public.projects.ansi.main.reverse │
11. public.share.stew.public.projects.ansi.releases.v1.reverse │
12. public.stew.codebase.ansi.trunk.reverse │
13. public.stew.public.projects.ansi.latest.reverse │
14. public.stew.public.projects.ansi.main.reverse │
15. public.stew.public.projects.ansi.releases.v1.reverse │
16. public.stew.suckycodebase.ansi.trunk.reverse │
17. public.stew.suckycodebase.json.pr.ansi.trunk.reverse │
18. public.tmp.ansi.reverse ┘
Tip: You can use `undo` or `reflog` to undo this change.
Might be as simple as doing the diff with respect to the current namespace rather than the root.
As a first step, maybe try to confirm the source of slowness and see if there's anything obviously and easily more efficient that can be done? Having a more efficient diff will help in other places like https://github.com/unisonweb/unison/pull/3524
Something noteworthy: doing something like move.term foo thevoid.foo
followed by delete.namespace thevoid
is much faster. I'm not sure whether ucm is doing all of the checks for unnamed references that it should when using move/delete.namespace, but if so it's worth checking why this approach is so much faster.
We suspect most of the slowness is with the diff. @ceedubs will investigate
It does not appear to be the namespace diff that is causing the slowness. At least diff.namespace
is fast in comparison.
I created a new empty namespace and here are the timings:
- add 2 terms: 37.0 s (cpu), 37.4 s (system)
- delete 1 term: 36.3 s (cpu), 104 s (system)
- diff.namespace: 65.6 ms (cpu), 110 ms (system)
Click here for ucm session
``` > cd .tmp12497☝️ The namespace .tmp12497 is empty. InputPattern: 11.6 ms (cpu), 22.0 ms (system) .tmp12497> typecheck: 3.81 ms (cpu), 6.32 ms (system)
I found and typechecked these definitions in ~/code/unison/advent-of-code-2022.u. If you do an
add
or update
, here's how your codebase would change:
⍟ These new definitions are ok to `add`:
thing1 : Text
thing2 : Text
UnisonFileChanged: 170 ms (cpu), 242 ms (system) .tmp12497> add
⍟ I've added these definitions:
thing1 : Text
thing2 : Text
add-default-metadata: 88.0 µs (cpu), 504 µs (system) updateRoot: 6.36 ms (cpu), 26.2 ms (system) InputPattern: 37.0 s (cpu), 37.4 s (system) .tmp12497> delete.term thing1 updateRoot: 9.22 ms (cpu), 25.2 ms (system) diffHelper: 36.2 s (cpu), 103 s (system)
Removed definitions:
1. tmp12497.thing1 : Text
Tip: You can use undo
or reflog
to undo this change.
InputPattern: 36.3 s (cpu), 104 s (system)
.tmp12497> history
Note: The most recent namespace hash is immediately below this message.
⊙ 1. #j026b2obdg
- Deletes:
thing1
□ 2. #6ajddl1kso (start of history) InputPattern: 2.19 ms (cpu), 4.46 ms (system) .tmp12497> diff.namespace 2 resolveShortCausalHash: 10.3 ms (cpu), 29.6 ms (system) diffHelper: 539 µs (cpu), 686 µs (system)
Removed definitions:
1. thing1 : Text
InputPattern: 65.6 ms (cpu), 110 ms (system)
</details>
@ceedubs diff might have a short circuit if one of the arguments is an empty namespace. Can you humor me and add another point in history and then diff those?
I don't believe it was empty. I added two terms, deleted one, and then compared the benefits with two terms to the subsequent one with one term.
When I tried with an empty namespace it failed, saying the namespace was empty (even though it had history)
@ceedubs the expensive part of a diff is forcing the deep* fields in the cached root branch. In the transcript you posted the delete forces those thunks when it produces a diff. So, the subsequent diff has significantly less work to do.
Similarly, if you add a term then do a namespace diff twice, you will see that the second diff is much faster.