enso
enso copied to clipboard
Generate benchmark result pages in the CI
Since its inception, the benchmark results have been collected, saved into cache on my local file system, and uploaded to https://github.com/enso-org/engine-benchmark-results. It is scheduled to run as a daily job. The https://github.com/enso-org/enso/blob/49835500895be8b1f779cc2c40df9c03ee3ebcb8/tools/performance/engine-benchmarks/bench_download.py script requires a (local) cache because the GH artifacts are dropped after 3 months (this period cannot be extended, according to the GH policies), so we would lose the benchmark results older than 3 months. The cache is located on my local file system, without any backups.
Let's add another daily GH action that runs bench_download.py
.
### Tasks
- [x] Reimplement the cache from local filesystem to https://github.com/enso-org/engine-benchmark-results
- [ ] Add another daily GH action that runs `bench_download.py`.
- [ ] Generate a GH token (permissions described in https://github.com/enso-org/enso/issues/8857#issuecomment-1914558703)
A rather more difficult, but more permanent, solution is to implement just a frontend (hosted as a website in https://github.com/enso-org/engine-benchmark-results repo) that fetches all the bench results dynamically from the GH artifacts. Such a solution would have the following properties:
- Do not collect results older than 3 months into any kind of cache - no need for any kind of backend.
- Benchmark comparison (compare a custom branch to `develop) can be handled by the website
- No more need for a custom
bench_download.py
script - everything could be done on the website. - The advantage is that we would see all the results immediately.
We need to look into this task eventually, as the current solution is not sustainable - in case something happens to my laptop, we lose the daily website update. Note that the results could be still recovered from the website itself (the results are plain javascript data structures in the HTML), although, that would require a lot of manual work.
@jdunkerley @JaroslavTulach What do you say about the proposed solution? Should we keep the status quo, but just move its execution into the CI (a bit easier solution, just rewrite some bits of bench_download.py
script so that it can run as a daily GH action possibly with access to some remote file system), or should we aim for a more permanent solution? Any other propositions?
CCing @mwu-tow
should we aim for a more permanent solution
Start with Add another daily GH action that runs bench_download.py
., generates and uploads the website to the repository.
Personal wish: don't write that GH action in Rust, please.
As for the issue — I'd like to chat sometime to better understand it.
Personal wish: don't write that GH action in Rust, please.
This is not about personal favors, while I want to keep you happy, there are various factors and tradeoffs involved. I think your remark goes beyond this particular issue.
Personal wish: don't write that GH action in Rust, please.
This is not about personal favors, ... there are various factors and tradeoffs involved. I think your remark goes beyond this particular issue.
Yes, this remark certainly reflects the wider current CI situation. The reason why it is mentioned in this issue is that this new CI action pipeline has nothing to do with already existing CI actions and as such it is a chance to start from scratch and avoid drawbacks of the current CI situation.
We spoked with @mwu-tow and this is the summary of our discussion:
- Let's create another GH action yaml descriptor for this job
- There is no need to generate this yaml file from Rust. There would be no benefits of doing so, since this job is a very simple one.
- Let's use the https://github.com/enso-org/engine-benchmark-results repository as the cache itself - just upload the JSON files there directly.
- Locally, I have 8 MB and 700 JSON files as a cache data for the last year. No need to do this via some sophisticated technology like AWS.
- There is a single simple actionable item that blocks me from implementing this - generate a GH token with the following permissions:
- Download artifacts from engine jobs in
enso-org/enso
repo. - Checkout and push to
enso-org/engine-benchmark-results
repo.
- Download artifacts from engine jobs in
@mwu-tow Please, generate the token for me and let me know how to use that from the yaml file. I will also edit the description of this issue to reflect this conclusion.
GitHub
https://enso-org.github.io/engine-benchmark-results/ - GitHub - enso-org/engine-benchmark-results: https://enso-org.github.io/engine-benchmark-results/
@Akirathan Thanks for the writeup!
I have created a secret named ENSO_BENCHMARK_RESULTS_TOKEN
take contains a Github Token (PAT) that should include the necessary privileges. I haven't used the fine-grained PATs before, so please let me know if there are any issues. I might have missed something.
Pavel Marek reports a new STANDUP for today (2024-02-15):
Progress: - Started to work on the bench generation on the CI
- Created the remote cache in the engine-benchmark-results repo - just push all the local JSON files there.
- Modularize the
bench_download.py
script and add some sanity tests there. It should be finished by 2024-02-29.
Pavel Marek reports a new STANDUP for today (2024-02-16):
Progress: - Implemented the communication with remote cache
- Uploaded whole local cache to the remote cache
- Next week, I will try to experiment with the new GitHub action. It should be finished by 2024-02-29.
Pavel Marek reports a new STANDUP for today (2024-02-19):
Progress: - Implementing a script that regenerates the website
- Refactoring the Python package
- Adding more tests
- Tomorrow, or the day after that, I should be able to start experimenting with the GH action. It should be finished by 2024-02-29.
Pavel Marek reports a new STANDUP for today (2024-02-22):
Progress: - Another QoL PR - upload native-image arg files as artifacts - #9094
- Local tests passes, website generation seems to work, created GH action definition...
- Starting and debugging the GH action It should be finished by 2024-02-29.
Pavel Marek reports a new STANDUP for today (2024-02-23):
Progress: - Blocked by wrong permissions for the GH token - https://github.com/enso-org/enso/pull/9075#issuecomment-1961131789
- Need to wait for Michal next week It should be finished by 2024-02-29.
Pavel Marek reports a new STANDUP for today (2024-02-26):
Progress: - Struggling with how to push to the repo.
- Seems like I have to use the PUT repos/content endpoint. It should be finished by 2024-02-29.
Pavel Marek reports a new STANDUP for today (2024-02-28):
Progress: - Discussing potential book club ideas from benchmark data.
- Fixing some last bugs in the "Upload bench to CI" PR, ready to merge it to develop.
- Let's see tomorrow how it will work after the benchmark jobs are finished. It should be finished by 2024-02-29.