fireproof initial exploration of a benchmark suite

An initial attempt at a useful benchmark suite. The target audience for this at the moment is other fireproof developers.

Goals

focus on measuring the performance of the implementation, not the browser or network
eventually provide enough coverage to identify performance some performance regression
provide a framework for other developers to add tests that quantify performance improvements

Non-goals (for now)

real-world perf measurements (CDN, cloud, ipfs, etc)

Early design choices

Use benchmark.js for execution/measurement
- Wanted something aiming for statistically significant measurements without reinventing the wheel
- Pro - Kind of just works, but lots of papercuts
- Cons - Project recently archived on github
- Cons - some async limitations on setup phase, open PRs on the project to improve this, untested by myself
Attempting to support browser and npm cli execution

Testing approach, and why some things look the way they do

These are what I consider "micro" benchmarks
The framework attempts to run your test function in a loop, enough times to make an accurate measurement
This pattern means it's helpful to have an easy to reuse or copy start state, the baseline from which your test function will do something interesting (that we time). With this in mind, I created the BenchConnector.

BenchConnector

A connector implementation I introduced for usage in the benchmarks
The idea was that I wanted to use FP sync to create these copies of databases, but I didn't want to actually set up separate infrastructure to run these tests. So, this is a pure in-memory implementation, which in many ways is a better fit for benchmarking anyway.
There are 2 basic modes of creating the bench connector:
- with no arguments, you a brand new connector, fresh state no data or metadata yet (until you connect to it)
- with on argument (another connector instance), this will share the SAME data and metadata storage, useful when you want to create a copy of an existing database
In this PR there are 2 tests using this functionality (one with 5 docs and one with 50). On page load, we create the template 5 and 50 doc databases, and sync them to our bench connector. Then during test execution time, we create brand new databases, and sync from the pre-seeded templates we created earlier.

Known Issues

I had to renamed the top-level variable for the encrypted-blockstore IIFE build (it was also using Fireproof, which seemed to conflict when I included both in the debug.html app). I don't know if I'm doing something wrong or not, this was the cleanest solution I found.
I tried to organize the benchmark code to facilitate both browser and npm cli execution, but this makes so many things awkward, I'm inclined to drop it.
I've added benchmarkj.s and lodash packages to the project, and things work find going via the npm cli execution. However, in the debug.html I'm including from a CDN URL, which it feels like I should be pointing to something local. But I need advice on the right way to solve this, not just another hard-coded copy in the scripts.
I don't know what to do with/about the pnpm-lock.yaml changes, they seem to be noise, but I have no idea.

More Tests Needed

documents with long history
impact of compaction settings

Cool Ideas for the future

Dump results of test execution in yet-another fireproof instance
Could build out minimal app to compare against previous runs, etc
Add ability for us/others to share these via sync
Have hosted version for people to run themselves, submit data, see visualizations, etc

Current Status

Works enough to illustrate the direction I'm headed and get feedback from others
Run it via CLI
- From packages/fireproof run pnpm bench
Run it via browser (after packages/fireproof pnpm serve)
- Hit localhost:8080 and navigate the bench.html

Screenshot 2024-06-25 at 19-21-22 Benchmark js test

Jun 25 '24 23:06 mschoch

Use benchmark.js for execution/measurement

Just FYI: people I trust making the right decisions are using https://github.com/tinylibs/tinybench for benchmarking, which according to them has less issues with async. So it might be worth a look.

Jul 15 '24 08:07 vmx

Moving this to draft again, as it requires a revist

Aug 13 '24 17:08 mschoch

@mschoch it'd be cool to see this dusted off as part of a validation suite for the next release. I think the big validatiosn we want are about package size and compatiblity, cold start time with a small dataset, and write speed with a 20k record dataset

Sep 15 '24 12:09 jchris

this could be worth bringing across the line now, perhaps we can run a standard bench across 18 and 19 and see what the differences are in terms of bundle size contribution, cold start time, and write to medium database are. Like a blog post @meghansinnott

Oct 13 '24 01:10 jchris

I think we can close this @jchris. The approach I took involved a custom connector, and that entire interface has changed, so it would need to be rewritten. I suspect the needs have changed and a different approach may make more sense.

Dec 13 '24 20:12 mschoch