unison icon indicating copy to clipboard operation
unison copied to clipboard

delete.term is slow for large codebases

Open ceedubs opened this issue 2 years ago • 1 comments

I haven't provided a sample codebase or stats to back up this claim, so feel free to push back on that. But here's my experience:

In my codebase when I call delete.term it takes 10+ seconds. It then presents output that looks something like this:

  Name changes:
  
    Original                                                  Changes
    1.  myproject.trunk.Foo.Config.node               ┐  2.  pr.myproject.conffile3.Foo.Config.node (removed)
    3.  pr.myproject.conffile.Foo.Config.node         │  
    4.  pr.myproject.conffile2.Foo.Config.node        │  
    5.  pr.myproject.conffile3.Foo.Config.node        │  
    6.  pr.myproject.debugthing.Foo.Config.node       │  
    7.  pr.myproject.fixscratch.Foo.Config.node       │  
    8.  pr.myproject.integrationtests.Foo.Config.node │  
    9.  pr.myproject.updatehttp5.Foo.Config.node      │  
    10. tmp.debugthing.Foo.Config.node             │  
    11. tmp.testing.Foo.Config.node                │  
    12. tmp.testing2.Foo.Config.node               ┘  
  
  Tip: You can use `undo` or `reflog` to undo this change.

I can see that these name changes aren't scoped to my current namespace. It seems like with all of the recent changes to keep things scoped to the current namespace, maybe they should be? Which maybe means that delete.term should only allow you to delete terms from the current namespace? You could potentially add a delete.term.global, but I'm not sure how commonly that would be used.

I'm also curious what makes this so slow. Is it mostly checking for dependencies? Is it checking for dependencies in the global scope instead of just the current namespace? Is most of the time spent using a global pretty-printer to print the name changes?

I'd imagine that the same applies to delete.type and maybe delete.namespace.

ceedubs avatar Sep 06 '22 16:09 ceedubs

It's really the diff.namespace computation that's slow, I believe. You can confirm this by doing diff.namespace on the two reflog entries.

pchiusano avatar Sep 06 '22 17:09 pchiusano

It's really weird that we show all the matching global names even if none of them changed; e.g. I did a delete.term reverse from within .public.ansi.trunk and got this kinda useless output:

  Name changes:

    Original                                                          Changes
    1.  public.ansi.trunk.reverse                                  ┐  2.  public.ansi.trunk.reverse (removed)
    3.  public.demo.trunk.lib.nimbus.lib.ansi.reverse              │
    4.  public.nimbus.pr.tsocket.lib.ansi.reverse                  │
    5.  public.nimbus.trunk.lib.ansi.reverse                       │
    6.  public.projects.ansi.latest.reverse                        │
    7.  public.projects.ansi.main.reverse                          │
    8.  public.projects.ansi.releases.v1.reverse                   │
    9.  public.share.stew.public.projects.ansi.latest.reverse      │
    10. public.share.stew.public.projects.ansi.main.reverse        │
    11. public.share.stew.public.projects.ansi.releases.v1.reverse │
    12. public.stew.codebase.ansi.trunk.reverse                    │
    13. public.stew.public.projects.ansi.latest.reverse            │
    14. public.stew.public.projects.ansi.main.reverse              │
    15. public.stew.public.projects.ansi.releases.v1.reverse       │
    16. public.stew.suckycodebase.ansi.trunk.reverse               │
    17. public.stew.suckycodebase.json.pr.ansi.trunk.reverse       │
    18. public.tmp.ansi.reverse                                    ┘

  Tip: You can use `undo` or `reflog` to undo this change.

ChrisPenner avatar Oct 13 '22 15:10 ChrisPenner

Might be as simple as doing the diff with respect to the current namespace rather than the root.

pchiusano avatar Oct 24 '22 15:10 pchiusano

As a first step, maybe try to confirm the source of slowness and see if there's anything obviously and easily more efficient that can be done? Having a more efficient diff will help in other places like https://github.com/unisonweb/unison/pull/3524

aryairani avatar Oct 24 '22 18:10 aryairani

Something noteworthy: doing something like move.term foo thevoid.foo followed by delete.namespace thevoid is much faster. I'm not sure whether ucm is doing all of the checks for unnamed references that it should when using move/delete.namespace, but if so it's worth checking why this approach is so much faster.

ceedubs avatar Oct 25 '22 11:10 ceedubs

We suspect most of the slowness is with the diff. @ceedubs will investigate

hojberg avatar Dec 01 '22 19:12 hojberg

It does not appear to be the namespace diff that is causing the slowness. At least diff.namespace is fast in comparison.

I created a new empty namespace and here are the timings:

  • add 2 terms: 37.0 s (cpu), 37.4 s (system)
  • delete 1 term: 36.3 s (cpu), 104 s (system)
  • diff.namespace: 65.6 ms (cpu), 110 ms (system)
Click here for ucm session ``` > cd .tmp12497

☝️ The namespace .tmp12497 is empty. InputPattern: 11.6 ms (cpu), 22.0 ms (system) .tmp12497> typecheck: 3.81 ms (cpu), 6.32 ms (system)

I found and typechecked these definitions in ~/code/unison/advent-of-code-2022.u. If you do an add or update, here's how your codebase would change:

⍟ These new definitions are ok to `add`:
 
  thing1 : Text
  thing2 : Text

UnisonFileChanged: 170 ms (cpu), 242 ms (system) .tmp12497> add

⍟ I've added these definitions:

thing1 : Text
thing2 : Text

add-default-metadata: 88.0 µs (cpu), 504 µs (system) updateRoot: 6.36 ms (cpu), 26.2 ms (system) InputPattern: 37.0 s (cpu), 37.4 s (system) .tmp12497> delete.term thing1 updateRoot: 9.22 ms (cpu), 25.2 ms (system) diffHelper: 36.2 s (cpu), 103 s (system)

Removed definitions:

1. tmp12497.thing1 : Text

Tip: You can use undo or reflog to undo this change. InputPattern: 36.3 s (cpu), 104 s (system) .tmp12497> history

Note: The most recent namespace hash is immediately below this message.

⊙ 1. #j026b2obdg

- Deletes:
 
  thing1

□ 2. #6ajddl1kso (start of history) InputPattern: 2.19 ms (cpu), 4.46 ms (system) .tmp12497> diff.namespace 2 resolveShortCausalHash: 10.3 ms (cpu), 29.6 ms (system) diffHelper: 539 µs (cpu), 686 µs (system)

Removed definitions:

1. thing1 : Text

InputPattern: 65.6 ms (cpu), 110 ms (system)

</details>

ceedubs avatar Dec 01 '22 20:12 ceedubs

@ceedubs diff might have a short circuit if one of the arguments is an empty namespace. Can you humor me and add another point in history and then diff those?

pchiusano avatar Dec 02 '22 03:12 pchiusano

I don't believe it was empty. I added two terms, deleted one, and then compared the benefits with two terms to the subsequent one with one term.

ceedubs avatar Dec 02 '22 10:12 ceedubs

When I tried with an empty namespace it failed, saying the namespace was empty (even though it had history)

ceedubs avatar Dec 02 '22 10:12 ceedubs

@ceedubs the expensive part of a diff is forcing the deep* fields in the cached root branch. In the transcript you posted the delete forces those thunks when it produces a diff. So, the subsequent diff has significantly less work to do.

Similarly, if you add a term then do a namespace diff twice, you will see that the second diff is much faster.

tstat avatar Dec 06 '22 17:12 tstat