enso Generate benchmark result pages in the CI

Since its inception, the benchmark results have been collected, saved into cache on my local file system, and uploaded to https://github.com/enso-org/engine-benchmark-results. It is scheduled to run as a daily job. The https://github.com/enso-org/enso/blob/49835500895be8b1f779cc2c40df9c03ee3ebcb8/tools/performance/engine-benchmarks/bench_download.py script requires a (local) cache because the GH artifacts are dropped after 3 months (this period cannot be extended, according to the GH policies), so we would lose the benchmark results older than 3 months. The cache is located on my local file system, without any backups.

Let's add another daily GH action that runs bench_download.py.

### Tasks
- [x] Reimplement the cache from local filesystem to https://github.com/enso-org/engine-benchmark-results
- [ ] Add another daily GH action that runs `bench_download.py`.
- [ ] Generate a GH token (permissions described in https://github.com/enso-org/enso/issues/8857#issuecomment-1914558703)

A rather more difficult, but more permanent, solution is to implement just a frontend (hosted as a website in https://github.com/enso-org/engine-benchmark-results repo) that fetches all the bench results dynamically from the GH artifacts. Such a solution would have the following properties:

Do not collect results older than 3 months into any kind of cache - no need for any kind of backend.
Benchmark comparison (compare a custom branch to `develop) can be handled by the website
No more need for a custom bench_download.py script - everything could be done on the website.
The advantage is that we would see all the results immediately.

Jan 25 '24 11:01 Akirathan

We need to look into this task eventually, as the current solution is not sustainable - in case something happens to my laptop, we lose the daily website update. Note that the results could be still recovered from the website itself (the results are plain javascript data structures in the HTML), although, that would require a lot of manual work.

@jdunkerley @JaroslavTulach What do you say about the proposed solution? Should we keep the status quo, but just move its execution into the CI (a bit easier solution, just rewrite some bits of bench_download.py script so that it can run as a daily GH action possibly with access to some remote file system), or should we aim for a more permanent solution? Any other propositions?

Jan 25 '24 11:01 Akirathan

CCing @mwu-tow

should we aim for a more permanent solution

Start with Add another daily GH action that runs bench_download.py., generates and uploads the website to the repository.

Personal wish: don't write that GH action in Rust, please.

Jan 25 '24 13:01 JaroslavTulach

As for the issue — I'd like to chat sometime to better understand it.

Personal wish: don't write that GH action in Rust, please.

This is not about personal favors, while I want to keep you happy, there are various factors and tradeoffs involved. I think your remark goes beyond this particular issue.

Jan 25 '24 17:01 mwu-tow

Personal wish: don't write that GH action in Rust, please.

This is not about personal favors, ... there are various factors and tradeoffs involved. I think your remark goes beyond this particular issue.

Yes, this remark certainly reflects the wider current CI situation. The reason why it is mentioned in this issue is that this new CI action pipeline has nothing to do with already existing CI actions and as such it is a chance to start from scratch and avoid drawbacks of the current CI situation.

Jan 27 '24 06:01 JaroslavTulach

We spoked with @mwu-tow and this is the summary of our discussion:

Let's create another GH action yaml descriptor for this job
- There is no need to generate this yaml file from Rust. There would be no benefits of doing so, since this job is a very simple one.
Let's use the https://github.com/enso-org/engine-benchmark-results repository as the cache itself - just upload the JSON files there directly.
- Locally, I have 8 MB and 700 JSON files as a cache data for the last year. No need to do this via some sophisticated technology like AWS.
There is a single simple actionable item that blocks me from implementing this - generate a GH token with the following permissions:
- Download artifacts from engine jobs in enso-org/enso repo.
- Checkout and push to enso-org/engine-benchmark-results repo.

@mwu-tow Please, generate the token for me and let me know how to use that from the yaml file. I will also edit the description of this issue to reflect this conclusion.

GitHub
GitHub - enso-org/engine-benchmark-results: https://enso-org.github.io/engine-benchmark-results/
https://enso-org.github.io/engine-benchmark-results/ - GitHub - enso-org/engine-benchmark-results: https://enso-org.github.io/engine-benchmark-results/

Jan 29 '24 12:01 Akirathan

@Akirathan Thanks for the writeup!

I have created a secret named ENSO_BENCHMARK_RESULTS_TOKEN take contains a Github Token (PAT) that should include the necessary privileges. I haven't used the fine-grained PATs before, so please let me know if there are any issues. I might have missed something.

Jan 29 '24 13:01 mwu-tow

Pavel Marek reports a new STANDUP for today (2024-02-15):

Progress: - Started to work on the bench generation on the CI

Created the remote cache in the engine-benchmark-results repo - just push all the local JSON files there.
Modularize the bench_download.py script and add some sanity tests there. It should be finished by 2024-02-29.

Feb 15 '24 17:02 enso-bot[bot]

Pavel Marek reports a new STANDUP for today (2024-02-16):

Progress: - Implemented the communication with remote cache

Uploaded whole local cache to the remote cache
Next week, I will try to experiment with the new GitHub action. It should be finished by 2024-02-29.

Feb 16 '24 18:02 enso-bot[bot]

Pavel Marek reports a new STANDUP for today (2024-02-19):

Progress: - Implementing a script that regenerates the website

Refactoring the Python package
Adding more tests
Tomorrow, or the day after that, I should be able to start experimenting with the GH action. It should be finished by 2024-02-29.

Feb 19 '24 18:02 enso-bot[bot]

Pavel Marek reports a new STANDUP for today (2024-02-22):

Progress: - Another QoL PR - upload native-image arg files as artifacts - #9094

Local tests passes, website generation seems to work, created GH action definition...
Starting and debugging the GH action It should be finished by 2024-02-29.

Feb 22 '24 18:02 enso-bot[bot]

Pavel Marek reports a new STANDUP for today (2024-02-23):

Progress: - Blocked by wrong permissions for the GH token - https://github.com/enso-org/enso/pull/9075#issuecomment-1961131789

Need to wait for Michal next week It should be finished by 2024-02-29.

Feb 23 '24 19:02 enso-bot[bot]

Pavel Marek reports a new STANDUP for today (2024-02-26):

Progress: - Struggling with how to push to the repo.

Seems like I have to use the PUT repos/content endpoint. It should be finished by 2024-02-29.

Feb 26 '24 17:02 enso-bot[bot]

Pavel Marek reports a new STANDUP for today (2024-02-28):

Progress: - Discussing potential book club ideas from benchmark data.

Fixing some last bugs in the "Upload bench to CI" PR, ready to merge it to develop.
- Let's see tomorrow how it will work after the benchmark jobs are finished. It should be finished by 2024-02-29.

Feb 28 '24 18:02 enso-bot[bot]

enso enso copied to clipboard

Generate benchmark result pages in the CI

enso
enso copied to clipboard