vere
vere copied to clipboard
Performance of calling socket worse than lens
When doing similar operations (such as getting the ship's code or the output of vats) both by talking via HTTP with lens, or by running a thread via socket, the performance of the socket is consistently worse, being around 2x slower than lens. I checked, and when using the socket method, most of the time is spent by waiting for the output of recv
, so this seems to be something internal to Urbit.
Is this performance difference expected?
How did you go about benchmarking this? It would be great to reproduce this on my own machine as well.
Click boots up an instance of vere
to do the jam
ing and cue
ing so you're booting 2 ivory pills each time. Not sure if this is the cause.
I did a benchmark in this repo (https://github.com/guaraqe/urbit-benchmark), these are the results:
[nix-shell:~/code/urbit/test-urbit/benchmark]$ hyperfine ./code-lens './code-click ../salsyp-samzod'
Benchmark 1: ./code-lens
Time (mean ± σ): 32.2 ms ± 1.9 ms [User: 2.2 ms, System: 2.6 ms]
Range (min … max): 30.2 ms … 43.4 ms 66 runs
Benchmark 2: ./code-click ../salsyp-samzod
Time (mean ± σ): 320.3 ms ± 5.6 ms [User: 147.6 ms, System: 48.6 ms]
Range (min … max): 315.5 ms … 333.4 ms 10 runs
Summary
'./code-lens' ran
9.95 ± 0.62 times faster than './code-click ../salsyp-samzod'
The repo contains click, and two scripts, one running with click, another with lens. The first argument to code-click
is the pier of the ship. The lens port is hardcoded in the corresponding script.
@mopfel-winrux I measured locally, and the calls to urbit
take almost no time compared to the time interacting with the socket.
@guaraqe Do you have the numbers on hand for the time just to boot up the transient instances of Vere to jam/cue the noun?
If not, it should be easily testable by writing a script to jam an atom using Vere, and directly pipe the result to another transient instance of Vere to cue it. We might also want to test with a non-trivial noun (e.g. an entire inline thread).
The above isn't directed at anyone in particular; just wanted to note down the idea for benchmarking time with transient Vere instances independent of the socket.
I added a case with just calls the jam
, called code-nothing
:
[nix-shell:~/code/urbit/test-urbit/benchmark]$ hyperfine ./code-lens './code-click ../salsyp-samzod' './code-nothing ../salsyp-samzod'
Benchmark 1: ./code-lens
Time (mean ± σ): 33.1 ms ± 1.6 ms [User: 2.5 ms, System: 2.5 ms]
Range (min … max): 31.3 ms … 40.5 ms 71 runs
Benchmark 2: ./code-click ../salsyp-samzod
Time (mean ± σ): 330.6 ms ± 4.9 ms [User: 156.3 ms, System: 49.8 ms]
Range (min … max): 325.8 ms … 341.5 ms 10 runs
Benchmark 3: ./code-nothing ../salsyp-samzod
Time (mean ± σ): 101.8 ms ± 2.2 ms [User: 78.4 ms, System: 25.8 ms]
Range (min … max): 99.3 ms … 109.8 ms 28 runs
Summary
'./code-lens' ran
3.07 ± 0.16 times faster than './code-nothing ../salsyp-samzod'
9.97 ± 0.50 times faster than './code-click ../salsyp-samzod'
If we subtract, that would be around 130ms for the socket, which is still 4x more. This difference is more visible for +vats
, where the cost of calling the executable is smaller.
@ashelkovnykov I marked you in a private issue with more details.
Theres also a cue that gets called which would take just as long as jam
Theres also a cue that gets called which would take just as long as jam
Yeah, I removed it twice from the total to get to the 130ms.
Isn't this just an account of the extra time required for compiling the threads?
That sounds right. There's already a -code
thread in %base, can you just invoke that as opposed to passing/eval'ing the source?