ts-rust-zig-deez
ts-rust-zig-deez copied to clipboard
Discussion: how should we test the performance?
Here on GitHub, we can do such a simple and free workflow — whenever someone pushes their changes into the master
branch:
- There would start a GitHub Actions workflow specifically for those languages, which files have changed with that push
- It would run a series of performance tests, say, for the lexer of Rust (if there was an update for Rust)
- Upon completion, the workflow would upload results as artifacts
And then those results could be downloaded at any given point of time for up to 90 days after the test — that'll mean ThePrimeagen can come up with an arbitrary point in time, download test results all at once and then parse them.
Now, the workflow can be different, but the question is: how do we test the speed of interpreters doing lexing, for example?
- By simple
time
command? - Using
perf stat
? - Using Apache Bench plus some proxy which would feed tests into them? (to get some GC going on)
- ???
@ThePrimeagen
You might enjoy this video on Dave's Garage https://youtu.be/pSvSXBorw4A, where something similar was done. The issues I can see are the following.
-
This promotes "bad practice", in seeking optimization instead of readability. I'm not saying that the two are mutually exclusive, but I think at the core we're mostly trying to create this Interpreter in line with the book.
-
As you mentioned, measuring this is hard, do you rank in terms of Memory? In terms of Raw Speed? JIT languages are going to be cold, so do you "warm up the JIT" for the Measurements?
However this is in the viewpoint of "sticking to the book" and making this more intro-friendly.
I've personally been thinking it would be cool to see different approaches/features after we clear the initial book. Maybe that involves hyper optimizing for performance, maybe it involves outputting the AST as a PNG, or making a REPL, etc etc. Then going back and seeing what has been done. Would make for good content on Stream, and let some people flex their creative muscles.
Then we can rank them based on how creative they are. Language Agnostic, which performance isn't.
so i have some requirements for this to become a performance test, which if you watch any of the videos i have done thus far, that is going to be something i will do.
so for this to become a performance test i want to create each of the steps into a server and serve out the tokens, what i may be able to do with the parser, and then finally run the program itself and return out the output of the program. To me this is a proper performance test. it tests the language in a more real way. sys calls, memory usage, and all of that fun instead of some toy example
second as far as code goes. i want to have everything written in such a way that it is very simple. then it would be fun to talk about performance of each to see how to improve it.
which if you watch any of the videos i have done thus far
Hitting me with the Tuesday Callout D:
Makes sense, I'm assuming you'll dig more into that on Stream, so that we can make sure PRs can properly maintain that?
For the server approach, do you mean that each language will run whichever server framework the author chooses, or would we keep the languages as a CLI app, just wrap all with the same server?
If we aren't trying to test what the best server framework is, I would suggest that we make it into a long running CLI app all called by the same server. So the server would launch the app and then basically send requests to it (over stdin/stdout) like you would a repl. That would still lead to GC and whatnot. It just wouldn't require dependency on server frameworks.
I think he means something similar to Go vs. Rust vs. TypeScript repo and video. But yeah, the details should really be discussed later, when parsers would be implemented for all languages in the repo
Using the rdtsc
x86 instruction to count CPU cycles at the execution of the program (not including compile time, where applicable) and again at the end. end - start
and you have CPU cycles you can compare, repeat the previous steps a few more times per program and take the average and boom, job done.
I hate to jump too far ahead, but I am interested in if we could have a better outline of how this will work.
There are plenty of solutions that are going to be... interesting to benchmark to say the least. Bash/Google Sheets/ChatGPT are ones that come to mind instantly. I know JJDSL would need some changes to accept stuff during runtime. Not looking for the final method, but at least how we could expect the input to happen. Just going to list off the ones that come to mind.
- stdin
- stdin + File Reference (Might be useful if we want programs to parse large files)
- HTTP
- Socket Protocol of some kind
Likewise there is the output, but I'm of the opinion that an output is easier to handle than an input, so I'm less worried there.
Could probably do something like this, to include all the crummyness of the compilers/interpreters/jit/etc. Probably average over a bunch of runs or something?
$ time `cat testcode.monkey | foo-lang-impl > foo-lang-test-results`
It's probably important to run/build the tests in docker to avoid the nightmare of configuring the correct runtime environment for all the languages simultanously.
To that end, I added the following to my Makefile
:
docker-time: docker-build
docker run -i -v $(shell pwd):/deez deez_$(notdir $(shell pwd)) time ./bin/TsRustZigDeez
so then I can just run it like this:
$ cat test.monkey
let ackerman = fn(m,n) if (m == 0) n + 1 else if (n == 0) ackerman(m-1, 1) else ackerman(m-1, ackerman(m, n-1))
ackerman(3,8)
$ cat test.monkey | make docker-time
docker build . -t deez_cpp-spongman
[+] Building 7.3s (13/13) FINISHED
...
docker run -i -v /home/piersh/ts-rust-zig-deez/cpp-spongman:/deez deez_cpp-spongman time ./bin/TsRustZigDeez
repl
> nil
> 2045
>
real 0m 2.65s
user 0m 2.24s
sys 0m 0.39s
so my personal thought on this is that for a language to be "a part of the test" we are going to make MLAAS
POST /lex /parse /exec
/lex returns the JSONified tokens /parse (unsure yet) /exec the programs output
that way we can test these perfs as we go. i am going to be building a client using Turso (ad) and Drizzle (not an ad, seems like a neet orm to try out). That work will probably start today. Should be fun!
Interesting, I assume you mean HTTP POST? Does that mean each language needs to implement an http server? What about the Assembler guy? Does he have to write an http server in asm? How to isolate the performance of the language implementation from the performance of the http server?
I'm assuming here, but we'd probably disregard the HTTP timing, so you wouldn't need to implement the HTTP Server in your language, just would have to wrap your solution. For example (in Java/Spring)
@PostMapping("/rest/lexer")
public Long timeLexer(@RequestBody String monkeyScript) {
long startTime = System.currentTimeMillis();
Lexer lexer = new Lexer(monkeyScript);
// Assume we parse all tokens.
return System.currentTimeMillis() - startTime;
}
This would allow each instance to "disregard" most of the overhead of the HTTP, and only return the rough actual time cost.
Main exception I can think of would be the ASM, who might have to deal with additional overhead in calling it, but they could probably just wrap it in some C/C++ and do it like this.
I see an obvious optimization there ;-)
I'll entertain you.
//@PostMapping("/rest/lexer")
//public Long timeLexer(@RequestBody String monkeyScript) {
// long startTime = System.currentTimeMillis();
Lexer lexer = new Lexer(monkeyScript);
// Assume we parse all tokens.
// return System.currentTimeMillis() - startTime;
//}
Since we're removing overhead we only care about new Lexer(string)
, which doesn't need to be optimized as assuming we use ZGC, Object Churn isn't an issue. If your obvious Optimization is to not use Java, bad joke.
IMO the interaction with the language implementation should just be via stdin/stdout (as this is what the book implements). this is the simplest thing that removes all other variables. if you want to wrap that in a standard web server that services requests and runs docker & pipes the request/results in/out of it, that's fine, but i'm not entirely sure what you're testing at that point. there's no need to implement the timer inside the interpreter code, time ./bin/xyz
is sufficient to test startup & execution.
stdin
/stdout
would work, but we have implementations in stuff like Google Sheets and Scratch, you could make an argument that we don't have a need to test those, or we could wrap them in something that can take stdin
/stdout
. But now you are also comparing the time the wrapper takes. Or in the case of a runtime language, I don't want to time the startup cost of a language.
If we're going to compare implementations, I only really care to see the time difference between the code itself, everything else feels like noise.
yeah, google sheets & scratch are going to require some kind of wrapper whatever everyone else uses. stdin/stdout just seems like the baseline because that's what everyone (else) is already implementing.
IMO startup time is a big factor. if the C runtime took 20 seconds to start, nobody would use it regardless of how efficient the compiled code was.
Depends on the context. A long running server's startup time doesn't matter, since it's going to be on for a long time. If you have a client app/burst application, then it's going to matter more. Maybe we measure both?
yeah, take your pick, single-request or long-running:
- cgi
- fastcgi
- microservice
- lambda
3Days has some HTTP stuff so I think a web server in HolyC can be done. I do not know if it can be done without breaking compatibility on what it should be actually tested on, which is TempleOS itself