ts-rust-zig-deez Discussion: how should we test the performance?

Here on GitHub, we can do such a simple and free workflow — whenever someone pushes their changes into the master branch:

There would start a GitHub Actions workflow specifically for those languages, which files have changed with that push
It would run a series of performance tests, say, for the lexer of Rust (if there was an update for Rust)
Upon completion, the workflow would upload results as artifacts

And then those results could be downloaded at any given point of time for up to 90 days after the test — that'll mean ThePrimeagen can come up with an arbitrary point in time, download test results all at once and then parse them.

Now, the workflow can be different, but the question is: how do we test the speed of interpreters doing lexing, for example?

By simple time command?
Using perf stat?
Using Apache Bench plus some proxy which would feed tests into them? (to get some GC going on)
???

@ThePrimeagen

May 26 '23 22:05 novusnota

You might enjoy this video on Dave's Garage https://youtu.be/pSvSXBorw4A, where something similar was done. The issues I can see are the following.

This promotes "bad practice", in seeking optimization instead of readability. I'm not saying that the two are mutually exclusive, but I think at the core we're mostly trying to create this Interpreter in line with the book.
As you mentioned, measuring this is hard, do you rank in terms of Memory? In terms of Raw Speed? JIT languages are going to be cold, so do you "warm up the JIT" for the Measurements?

However this is in the viewpoint of "sticking to the book" and making this more intro-friendly.

I've personally been thinking it would be cool to see different approaches/features after we clear the initial book. Maybe that involves hyper optimizing for performance, maybe it involves outputting the AST as a PNG, or making a REPL, etc etc. Then going back and seeing what has been done. Would make for good content on Stream, and let some people flex their creative muscles.

Then we can rank them based on how creative they are. Language Agnostic, which performance isn't.

May 31 '23 01:05 Crain-32

so i have some requirements for this to become a performance test, which if you watch any of the videos i have done thus far, that is going to be something i will do.

so for this to become a performance test i want to create each of the steps into a server and serve out the tokens, what i may be able to do with the parser, and then finally run the program itself and return out the output of the program. To me this is a proper performance test. it tests the language in a more real way. sys calls, memory usage, and all of that fun instead of some toy example

May 31 '23 02:05 ThePrimeagen

second as far as code goes. i want to have everything written in such a way that it is very simple. then it would be fun to talk about performance of each to see how to improve it.

May 31 '23 02:05 ThePrimeagen

which if you watch any of the videos i have done thus far

Hitting me with the Tuesday Callout D:

Makes sense, I'm assuming you'll dig more into that on Stream, so that we can make sure PRs can properly maintain that?

May 31 '23 02:05 Crain-32

For the server approach, do you mean that each language will run whichever server framework the author chooses, or would we keep the languages as a CLI app, just wrap all with the same server?

If we aren't trying to test what the best server framework is, I would suggest that we make it into a long running CLI app all called by the same server. So the server would launch the app and then basically send requests to it (over stdin/stdout) like you would a repl. That would still lead to GC and whatnot. It just wouldn't require dependency on server frameworks.

May 31 '23 13:05 bhansconnect

I think he means something similar to Go vs. Rust vs. TypeScript repo and video. But yeah, the details should really be discussed later, when parsers would be implemented for all languages in the repo

May 31 '23 14:05 novusnota

Using the rdtsc x86 instruction to count CPU cycles at the execution of the program (not including compile time, where applicable) and again at the end. end - start and you have CPU cycles you can compare, repeat the previous steps a few more times per program and take the average and boom, job done.

Jun 02 '23 23:06 slendidev

I hate to jump too far ahead, but I am interested in if we could have a better outline of how this will work.

There are plenty of solutions that are going to be... interesting to benchmark to say the least. Bash/Google Sheets/ChatGPT are ones that come to mind instantly. I know JJDSL would need some changes to accept stuff during runtime. Not looking for the final method, but at least how we could expect the input to happen. Just going to list off the ones that come to mind.

stdin
- stdin + File Reference (Might be useful if we want programs to parse large files)
HTTP
Socket Protocol of some kind

Likewise there is the output, but I'm of the opinion that an output is easier to handle than an input, so I'm less worried there.

Jun 05 '23 15:06 Crain-32

Could probably do something like this, to include all the crummyness of the compilers/interpreters/jit/etc. Probably average over a bunch of runs or something?

$ time `cat testcode.monkey | foo-lang-impl > foo-lang-test-results`

Jun 07 '23 04:06 codebrainz

It's probably important to run/build the tests in docker to avoid the nightmare of configuring the correct runtime environment for all the languages simultanously.

To that end, I added the following to my Makefile:

docker-time: docker-build
	docker run -i -v $(shell pwd):/deez deez_$(notdir $(shell pwd)) time ./bin/TsRustZigDeez

so then I can just run it like this:

$ cat test.monkey
let ackerman = fn(m,n) if (m == 0) n + 1 else if (n == 0) ackerman(m-1, 1) else ackerman(m-1, ackerman(m, n-1))
ackerman(3,8)

$ cat test.monkey | make docker-time
docker build . -t deez_cpp-spongman
[+] Building 7.3s (13/13) FINISHED                                                                                                                                                                              
...
docker run -i -v /home/piersh/ts-rust-zig-deez/cpp-spongman:/deez deez_cpp-spongman time ./bin/TsRustZigDeez
repl
> nil
> 2045
> 
real  0m 2.65s
user    0m 2.24s
sys     0m 0.39s

Jun 07 '23 06:06 Spongman

so my personal thought on this is that for a language to be "a part of the test" we are going to make MLAAS

POST /lex /parse /exec

/lex returns the JSONified tokens /parse (unsure yet) /exec the programs output

that way we can test these perfs as we go. i am going to be building a client using Turso (ad) and Drizzle (not an ad, seems like a neet orm to try out). That work will probably start today. Should be fun!

Jun 07 '23 12:06 ThePrimeagen

Interesting, I assume you mean HTTP POST? Does that mean each language needs to implement an http server? What about the Assembler guy? Does he have to write an http server in asm? How to isolate the performance of the language implementation from the performance of the http server?

Jun 07 '23 16:06 Spongman

I'm assuming here, but we'd probably disregard the HTTP timing, so you wouldn't need to implement the HTTP Server in your language, just would have to wrap your solution. For example (in Java/Spring)

@PostMapping("/rest/lexer")
public Long timeLexer(@RequestBody String monkeyScript) {
   long startTime = System.currentTimeMillis();
   Lexer lexer = new Lexer(monkeyScript);
   // Assume we parse all tokens.
   return System.currentTimeMillis() - startTime;
}

This would allow each instance to "disregard" most of the overhead of the HTTP, and only return the rough actual time cost.

Main exception I can think of would be the ASM, who might have to deal with additional overhead in calling it, but they could probably just wrap it in some C/C++ and do it like this.

Jun 07 '23 20:06 Crain-32

I see an obvious optimization there ;-)

Jun 07 '23 20:06 Spongman

I'll entertain you.

//@PostMapping("/rest/lexer")
//public Long timeLexer(@RequestBody String monkeyScript) {
//   long startTime = System.currentTimeMillis();
   Lexer lexer = new Lexer(monkeyScript);
   // Assume we parse all tokens.
//   return System.currentTimeMillis() - startTime;
//}

Since we're removing overhead we only care about new Lexer(string), which doesn't need to be optimized as assuming we use ZGC, Object Churn isn't an issue. If your obvious Optimization is to not use Java, bad joke.

Jun 07 '23 20:06 Crain-32

IMO the interaction with the language implementation should just be via stdin/stdout (as this is what the book implements). this is the simplest thing that removes all other variables. if you want to wrap that in a standard web server that services requests and runs docker & pipes the request/results in/out of it, that's fine, but i'm not entirely sure what you're testing at that point. there's no need to implement the timer inside the interpreter code, time ./bin/xyz is sufficient to test startup & execution.

Jun 07 '23 21:06 Spongman

stdin/stdout would work, but we have implementations in stuff like Google Sheets and Scratch, you could make an argument that we don't have a need to test those, or we could wrap them in something that can take stdin/stdout. But now you are also comparing the time the wrapper takes. Or in the case of a runtime language, I don't want to time the startup cost of a language.

If we're going to compare implementations, I only really care to see the time difference between the code itself, everything else feels like noise.

Jun 07 '23 21:06 Crain-32

yeah, google sheets & scratch are going to require some kind of wrapper whatever everyone else uses. stdin/stdout just seems like the baseline because that's what everyone (else) is already implementing.

IMO startup time is a big factor. if the C runtime took 20 seconds to start, nobody would use it regardless of how efficient the compiled code was.

Jun 07 '23 21:06 Spongman

Depends on the context. A long running server's startup time doesn't matter, since it's going to be on for a long time. If you have a client app/burst application, then it's going to matter more. Maybe we measure both?

Jun 07 '23 21:06 Crain-32

yeah, take your pick, single-request or long-running:

cgi
fastcgi
microservice
lambda

Jun 07 '23 22:06 Spongman

3Days has some HTTP stuff so I think a web server in HolyC can be done. I do not know if it can be done without breaking compatibility on what it should be actually tested on, which is TempleOS itself

Jun 10 '23 01:06 slendidev

ts-rust-zig-deez ts-rust-zig-deez copied to clipboard

Discussion: how should we test the performance?

ts-rust-zig-deez
ts-rust-zig-deez copied to clipboard