ford icon indicating copy to clipboard operation
ford copied to clipboard

Help with Graph Generation Time on MacBook

Open mathomp4 opened this issue 2 years ago • 2 comments

Thanks to @ZedThree, I can now build the docs for the library I work on. However, in my testing I was always setting graph: false because the graphs took a while. But, now that v6.1.12 is out, I can look to the fancier bits of FORD.

So, my first test was hardware-based. I mainly develop on my MacBook Pro (8-core, Coffee Lake, 32 GB RAM), but I also can build on a node of the cluster I work on (48-core, Cascade Lake, 192 GB RAM). I also decided to try out generating graphs on GitHub Actions (2-core, 7 GB RAM per this page). (ETA: I added an M1 MacBook I have access to. Still quite slow, but 2x faster than my Intel MacBook. So odd.)

Between these three (for the "graph" column, it was set to maxdepth: 4, maxnodes: 32):

Machine No Graph Graph
M1 MacBook 00:01:06 00:43:20
Intel MacBook 00:01:44 01:55:58
Cluster 00:03:48 00:16:42
GitHub 00:01:43 00:06:34

Yes. Almost 2 hours on my laptop with graphs on. That is...weird. I mean, yes, my laptop isn't as good as a compute node of a cluster, yeah, but that amazingly worse? And if it was number of cores or memory, you'd think GitHub Actions would lose. (Also a bit surprised about the slow no-graph on the cluster, but it is a shared disk so sometimes file-generation can be slower.)

Second, looking at the docs, it seems like graph_maxnodes and graph_maxdepth might be tuning knobs to try and make graph generation cheaper. So, since the cluster is doing "okay", I did some tests and:

maxdepth maxnodes time
2 16 15:57
4 32 16:20
8 64 16:23
16 128 17:15
1024 8192 28:18

So, for now at least, if I don't go crazy, these two knobs don't do too much.

So yeah...any idea why my MacBook is amazingly slower than the cluster or a GitHub Actions VM when turning on graphs?

mathomp4 avatar Jun 27 '22 20:06 mathomp4

My first guess was going to be something to do with it being an M1 CPU, but this is actually on an x86, so probably not that...

The other thing we've had issues with is the parallelisation. Could you try setting parallel: 1 and see if that helps things?

ZedThree avatar Jun 30 '22 08:06 ZedThree

My first guess was going to be something to do with it being an M1 CPU, but this is actually on an x86, so probably not that...

I updated my table with an M1 number and it's much faster than the Intel Mac. 43 minutes vs 2 hours!

The other thing we've had issues with is the parallelisation. Could you try setting parallel: 1 and see if that helps things?

Sadly, this did nothing. Still about 2 hours. I wonder if this is some weird macOS internal setting? Since even the M1 is much slower, it's like it's OS-dependent!

mathomp4 avatar Jul 01 '22 12:07 mathomp4

@mathomp4 Could you try

time (echo 'digraph { a -> b }' | dot -Tsvg > output.svg)

If that takes substantially longer than 0.1s then it's something to do with graphviz on Macs. I found a couple of issues to do with slow performance on M1, so that's something you might want to investigate

ZedThree avatar Nov 09 '23 17:11 ZedThree

@ZedThree Interestin. Running under hyperfine on my M1 Mac shows:

❯ hyperfine --warmup 3 "(echo 'digraph { a -> b }' | dot -Tsvg > output.svg)"
Benchmark 1: (echo 'digraph { a -> b }' | dot -Tsvg > output.svg)
  Time (mean ± σ):     161.2 ms ±   4.4 ms    [User: 77.6 ms, System: 17.0 ms]
  Range (min … max):   152.8 ms … 168.0 ms    17 runs

and on my M2 Mac:

❯ hyperfine --warmup 3 "(echo 'digraph { a -> b }' | dot -Tsvg > output.svg)"
Benchmark 1: (echo 'digraph { a -> b }' | dot -Tsvg > output.svg)
  Time (mean ± σ):     150.9 ms ±   7.9 ms    [User: 69.1 ms, System: 16.9 ms]
  Range (min … max):   130.2 ms … 163.4 ms    18 runs

Now if I run on the Linux cluster:

$ hyperfine --warmup 3 "(echo 'digraph { a -> b }' | dot -Tsvg > output.svg)"
Benchmark 1: (echo 'digraph { a -> b }' | dot -Tsvg > output.svg)
  Time (mean ± σ):      24.1 ms ±   1.4 ms    [User: 17.5 ms, System: 5.8 ms]
  Range (min … max):    22.2 ms …  28.5 ms    108 runs

150 ms v 24 ms. So odd since you'd think Arm Macs would scream at this sort of thing!

mathomp4 avatar Nov 09 '23 17:11 mathomp4

Might be an unoptimised binary/library. You might see if there's another way to install it.

I'll close this issue if that's ok, seeing as it's not Ford! :smile:

(Also thanks for letting me know about hyperfine, looks very useful! I already use fd and bat from the same person!)

ZedThree avatar Nov 09 '23 17:11 ZedThree