boa icon indicating copy to clipboard operation
boa copied to clipboard

Explore other storage methods for our test results

Open jedel1043 opened this issue 1 year ago • 12 comments

Right now, our test results are just stored in a big JSON. This is very much not ideal for several reasons, like the overall big size and the slow deserialization and serialization speed.

Right now, I see two other representations that we could use, each one with its pros and cons.

Binary serialized file (any format)

  • ✔️ Small size.
  • ✔️ Fast to serialize and deserialize.
  • ✔️ Naturally represents the tree structure of test suites.
  • ❌ Harder to query for specific test types (per feature/version/suite).
  • ❌ Most formats cannot be lazily deserialized.

Binary database file (Sqlite)

  • ✔️ Small size.
  • ✔️ Decent serialization speed and virtually nonexistent deserialization cost. (Sqlite lazily loads entries).
  • ✔️ Nice for data queries using SQL syntax.
  • ❌ Complicates our serialization logic considerably.
  • ❌ Requires a bit of database design to represent the tree structure of suites as a table.

jedel1043 avatar Mar 09 '24 15:03 jedel1043

We could also have a small backend, connected to a DB with a simple JSON API, since I'm not sure how easy it is to handle SQLite from Docusaurus, for example.

Serializing into the DB should be straightforward with Diesel or something like that. We would just need to derive some structures and associate commit IDs (or tags) to results.

Razican avatar Mar 09 '24 15:03 Razican

Potential theoretical option:

Store the results in a Test262 Results repository

Maybe someone else knows if this is a no-go from the jump. I was doing some research and we might be able to run a github action in another repository off a trigger action from the main repo. We'd have to test the idea, but if we could send results to another repository and have the github action commit the file. We'd be a bit less constrained on size.

Some benefits:

  • provides easier visibility and access to the results.
  • representation / formatting is less constrained as it's removed from the main repository
  • allows us to setup our results as a REST API in github-pages (for example: boajs.dev/test262-results/)

Negatives:

  • Might complicate CI considerably
  • Would require testing to determine viability

nekevss avatar Mar 09 '24 15:03 nekevss

While I like the idea of easily querying the data, my main concern with sqlite is that we need a backend to handle to data, instead of just github pages.

I like @nekevss idea, instead of pushing to gh-branch on main push, we push to a cross-repository branch.

The two actions that we use to get the current gh-pages branch and push to it:

So this looks very doable, and I don't think it would complicate the CI that much.

HalidOdat avatar Mar 09 '24 16:03 HalidOdat

I'm not sure how easy it is to handle SQLite from Docusaurus, for example.

I think it's pretty straightforward with the sql.js library...

...however, I also really like this idea:

We could also have a small backend, connected to a DB with a simple JSON API,

But in that case I'd just move the whole webpage into the server, just to make it easier to integrate and keep in sync.

jedel1043 avatar Mar 09 '24 16:03 jedel1043

I see the concerns about having to use a backend for this, but I just wanted to mention that the storing method for the data is completely orthogonal to the data representation itself; we could move our current JSON data into a separate repo, and we could also keep using this repo to store the data but using binary formats, for example.

jedel1043 avatar Mar 09 '24 16:03 jedel1043

The two actions that we use to get the current gh-pages branch and push to it:\n\nThe actions/checkout has a repository option.\nThe github-push-action action used to push to gh-pages has a repository option too for cross-repository pushing

About this, I always thought that we don't need to have test data for all commits made to the repo. Right now, if there are a lot of commits in succession, we're running the whole test suite for each and every one of them. This is not ideal because those commits could just be deps bumps, for example. In this case, I'd suggest just running the test suite once a day from the webpage repo itself.

jedel1043 avatar Mar 09 '24 16:03 jedel1043

My 2 cents

I like @Razican's idea of a backend for a long term solution, anything with a file is going to slow down overtime with the JSON parsing, I don't mind looking into having a database on our own server somewhere (which we could use funds to do), it is more moving parts and things to maintain so I understand the concern with it.

In the meantime I do like @nekevss's idea (especially in the short term) to just move things to another repo and throw the data in there. Im happy to help set this up, it seems like Halid has already pointed us in the direction of the variables to changes.

About this, I always thought that we don't need to have test data for all commits made to the repo. Right now, if there are a lot of commits in succession, we're running the whole test suite for each and every one of them. This is not ideal because those commits could just be deps bumps, for example. In this case, I'd suggest just running the test suite once a day from the webpage repo itself.

Yes I agree, we have more than enough traffic here now that every commit is excessive, a nightly runner is more than enough in my opinion, I would even go to weekly but that precision may be a bit off.

jasonwilliams avatar Mar 10 '24 17:03 jasonwilliams

Update

I've created https://github.com/boa-dev/website-data where we can put benchmarks and test262 results. This is set to run nightly. This leaves the question of where to put dev documentation. I don't know if dev docs have proven useful to keep?

@jedel1043 suggested improving the (contributing) docs to show developers how to generate their own API docs rather than hosting a new one from each commit. The other option is we have "nightly API docs" but I don't know how often these would actually get used.

If we do plan to release more often then maybe we could do-away with dev docs (as most of our users would be using the stable release anyway)

jasonwilliams avatar Mar 11 '24 09:03 jasonwilliams

Have thought about this recently, and I think the biggest low hanging fruit right now would be to avoid generating a test result file for each tag, and instead embed the tag results on the test itself. What do you think?

jedel1043 avatar Apr 07 '24 22:04 jedel1043

If I'm thinking about that right, it would involve consolidating the test files into one general results file, correct?

Is there a way that we can do that without increasing the size of the results file we have fetch via the site? My only concern would be causing the conformance page to lag from fetching too large of a file. Outside of that concern, I'd be open to any changes.

nekevss avatar Apr 08 '24 02:04 nekevss

If I'm thinking about that right, it would involve consolidating the test files into one general results file, correct?

Is there a way that we can do that without increasing the size of the results file we have fetch via the site? My only concern would be causing the conformance page to lag from fetching too large of a file. Outside of that concern, I'd be open to any changes.

Maybe we could test the increase of load times first? It could be that the new data doesn't increase our file size too much.

jedel1043 avatar Apr 08 '24 02:04 jedel1043

@CanadaHonk, we were discussing our data representation and since you are also dealing with the data representation of tests runs, we wanted to know how exactly does https://test262.fyi stores its tests runs for each engine.

jedel1043 avatar Apr 14 '24 13:04 jedel1043