advent-of-code-rust RFC: Higher sample count for benching?

RFC: Higher sample count for benching?

Open fspoettel opened this issue 1 year ago • 9 comments

Currently experimenting with a higher sample count for benching on my solution repo, might add to template after this year concludes. Unfortunately, using criterion with binary crates is not possible yet, would make this a bit simpler.

Advantages:

more accurate timings

Disadvantages:

Solutions take (at least) 2 seconds to execute (min. 1 sec or 5 iterations per part)
Input to benched functions required to be Copy (not an issue so far).

Example output:

🎄 Parser 🎄
✓ (avg. time: 34.30µs / 34507 samples)
🎄 Part 1 🎄
71506 (avg. time: 5.00ns / 3430531 samples)
🎄 Part 2 🎄
209603 (avg. time: 5.22µs / 125659 samples)

Dec 03 '22 11:12 fspoettel

This would be great! It'd also be nice if there was a way of adding the benchmarks to the README stars table in some way 🤔.

Another thing along these lines that I added to my repo created from this template (thank you for this by the way!) was a way to test the solution, not just the example. So that if I go refactoring and optimizing I don't accidentally break my implementation for the real input. I had to leave it ignored so it didn't break CI which doesn't have the real inputs downloaded, but it's something. Anyway, I thought I'd send it your way just to get your thoughts on it since I saw you had some RFCs up :)

Dec 04 '22 06:12 aranasaurus

It'd also be nice if there was a way of adding the benchmarks to the README stars table in some way 🤔.

Cool idea! 👍 Not sure if we should try extend the table or setup a different block. Maybe (a tweaked version of) the output of cargo all --release in a code block with some sysinfo? E.g:

Benchmarks

{some sys_info}
 
 
Day 01 
-------
Parser: ✓ avg. time: 35.68µs / 16298 samples
Part 1: avg. time: 5.00ns / 4000000 samples
Part 2: avg. time: 5.10µs / 176475 samples
 
Total: 40.79µs
 
Day 02 
-------
Not solved.
 
...

CI which doesn't have the real inputs downloaded

I'm a bit on the fence about that gitignore - I understand where it is coming from, but in reality most people check in the inputs anyway.

Dec 04 '22 08:12 fspoettel

It'd also be nice if there was a way of adding the benchmarks to the README stars table in some way 🤔.

Cool idea! 👍 Not sure if we should try extend the table or setup a different block. Maybe (a tweaked version of) the output of cargo all --release in a code block with some sysinfo? E.g:

The problem with adding this to the stars table is that the current template does not verify that we generate the correct result and the CI hook that adds the stars is probably not the best part to benchmark the solution.

It should be possible to add a function to the solve program or even add a submit program that automatically submits the result to the adventofcode webpage. I think I have seen a python template that does that. That submit program could than run the benchmark, submit the result and update the readme.

CI which doesn't have the real inputs downloaded

I'm a bit on the fence about that gitignore - I understand where it is coming from, but in reality most people check in the inputs anyway.

I won't argue with anyone that wants to commit the inputs themselfs but if the developer of AoC specificly asked that we don't publish the inputs, maybe a template isn't the best place to ignore that.

Dec 04 '22 14:12 Wasabi375

Cool idea! 👍 Not sure if we should try extend the table or setup a different block. Maybe (a tweaked version of) the output of cargo all --release in a code block with some sysinfo? E.g:

Benchmarks
{some sys_info}
 
 
Day 01 
-------
Parser: ✓ avg. time: 35.68µs / 16298 samples
Part 1: avg. time: 5.00ns / 4000000 samples
Part 2: avg. time: 5.10µs / 176475 samples
 
Total: 40.79µs
 
Day 02 
-------
Not solved.
 
...
 

Yeah this would be great!

CI which doesn't have the real inputs downloaded

I'm a bit on the fence about that gitignore - I understand where it is coming from, but in reality most people check in the inputs anyway.

If the dev of AoC asked for them not to be checked in, I don't want my stuff checking it in. Another option would be giving the aoc-cli a way to pull the session token from an env var instead of a file so that it could then be stored in Secrets, at which point CI could just run cargo download itself before running tests. Might be nice to also add a --all flag to download as well.

Dec 04 '22 15:12 aranasaurus

The problem with adding this to the stars table is that the current template does not verify that we generate the correct result and the CI hook that adds the stars is probably not the best part to benchmark the solution.

It should be possible to add a function to the solve program or even add a submit program that automatically submits the result to the adventofcode webpage. I think I have seen a python template that does that. That submit program could than run the benchmark, submit the result and update the readme.

Oh yeah, I'm perfectly happy with it not being part of CI, but instead being done locally. So long as these things are opt-in via a parameter, at least. Something like cargo solve 04 --benchmark --submit with defaults for both being false?

Dec 04 '22 16:12 aranasaurus

add a submit program that automatically submits the result to the adventofcode webpage.

The crate we use for downloading inputs supports input submissions, so we could leverage it here. I would accept a PR for that if someone is interested and created #19 for this.

cargo solve 04 --benchmark --submit

This looks like a good way forward. 👍 I have a somewhat working implementation of benchmarks on my repo for this year - I'll try and put it behind a --bench flag and open a PR with the changes if any of you wants to test / review.

give the aoc-cli a way to pull the session token from an env var

I think we could implement this on our side with the --session-file argument to aoc-cli and reuse the AOC_SESSION we already use for the table. The only annoyance here is that it might not be present which makes the CI a bit more complex.

edit: forgot to mention that my preference for benchmarking is that it should be done client-side. :)

Dec 04 '22 17:12 fspoettel

I am currently learning rust, thats how I found this repo. I already started to work on a "simple" solution for this using criterion. Right now I have a PoC for criterion working, running my solution for day 1. https://github.com/fspoettel/advent-of-code-rust/pull/20 Next step would be reading the benchmark results and updating the readme. After that I might also look into automatic submission.

Dec 04 '22 17:12 Wasabi375

I think we could implement this on our side with the --session-file argument to aoc-cli and reuse the AOC_SESSION we already use for the table. The only annoyance here is that it might not be present which makes the CI a bit more complex.

I was just looking at the aoc-cli repo to see about taking a crack at implementing #19 and saw this PR adding the ability for aoc-cli to use an env var for the session was just opened. So that may be easier to do soon :)

Dec 04 '22 19:12 aranasaurus

I implemented a proof of concept for readme benchmarks on my 2022 repo to get a better idea how this would work with the current code layout.

The code is still unfinished and a bit terrible, but benchmarks are written to the readme automatically when cargo all --release is executed. The (non-scientific) benchmarks work by executing the code between 10 and 100.000 times depending on how long the initial execution took in order to try and stay below a second per solution part. Not too sure about the output format, I may round this to milliseconds eventually, should be an easy enough change.

I also updated the solve command-line output to use this benchmark method - see following video for how this looks:

https://user-images.githubusercontent.com/1682504/205697329-fec0da10-71e0-4045-888c-b08a0e509c7b.mov

Dec 05 '22 17:12 fspoettel

advent-of-code-rust advent-of-code-rust copied to clipboard

RFC: Higher sample count for benching?

advent-of-code-rust
advent-of-code-rust copied to clipboard