materialize Long-running load test

Currently, we have a chbench-based load test that runs for ~24 hours. Because the chbench workload is designed to grow certain tables without bound, we exhaust the physical memory on our load test VM somewhere after 1-2 days of load. Furthermore, the chbench test doesn't exercise various features of Kafka.

So, I propose that we create a new load test during the current development cycle that uses the following:

Upserts, which allows Materialize's memory footprint to be stable or grow slowly while having a high rate of incoming Kafka messages
Kafka compacted topics
Multiple Kafka partitions

Should this be based on @wangandi's existing MBTA work? Or @ruchirK's billing demo? Or something else?

cc @benesch

Jun 02 '20 20:06 cuongdo

Do we have a target for what "high rate" is? That would help in determining what we base the work on.

Also, other things to think about include:

how many unique keys the load test should have.
what our target update/delete rate is
the distribution of updates across keys. We may want to have one load test on keys that update equally frequently and another one where a few are updated significantly more frequently than others.

Jun 02 '20 22:06 wangandi

“High rate” would be influenced by what the heaviest customer workloads have looked like so far, though possibly scaled down to fit a less costly EC2 VM.

In general, having the distribution of data and types of operations be as different from chbenchmark as possible would help ensure the two tests are exercising different aspects of Materialize.

Jun 02 '20 23:06 cuongdo

I have a load test that looks very similar to this that I will put up to a PR for either today or tomorrow

Jun 03 '20 18:06 ruchirK

we have a upsert perf test now so getting a long running load test should be a few mzcompose tweaks away

not sure

-- how much memory we want to use -- how long to run for -- other environment tweaks?

Jul 22 '20 18:07 ruchirK

Also we do technically currently have a weekly load test, only for chbench, it restarts every tuesday at around 4am: http://grafana.mz/d/materialize-overview/materialize-overview-load-tests?orgId=1&refresh=1m&var-env=dev&var-purpose=load_test_weekly&var-test=chbench&var-workflow=All&var-commit_time=All&var-git_ref=All&var-id=.*&var-command=execute&var-command=query&var-timely_workers=4&var-built_at=2020-07-20T23:31:45Z&var-build_version=0.4.0-dev&var-build_sha=006fe95155b99359e5942d0c6a7c6ebdfa0e71b1&var-instances=All&var-id_explicit=All&from=now-7d&to=now

Jul 22 '20 20:07 quodlibetor

how much memory we want to use

It would be great if we could have a load test that we expect to stay below 1 GB of data, but has an extremely high flow (hundreds of thousands/millions of messages per second)

How long to run for

For this it is useful to both run daily and weekly or perpetually for every point release, until the next point release, I think.

Daily helps us narrow down regressions faster
Weekly/monthly gives more assurance about performance over time

I also would be happy to set up 4 concurrent 4-week tests that restart in a cycle on weeks, so that we have weekly checks visibility on monthly trends.

Jul 22 '20 20:07 quodlibetor

we can 100% have the upsert demo stay at a low footprint (single digit gb) however I am not sure if we can achieve hundreds of thousands of messages per second its a little bit dependent on what the load generating box is capable of (we default to 8k records per second)

Also just to be clear this test doesn't do anything other than CREATE MATERIALIZED SOURCE ... ENVELOPE UPSERT. I want to understand a little bit better what we expect to see from a week long test vs a 6 hour test in that scenario (for example if we want to exercise reading compacted data we could lower the compaction lag?)

Jul 22 '20 22:07 ruchirK

Long-running tests are more indicative of how customers will use our code. Also, some problems, such as small memory leaks and file descriptor leaks, are not necessarily obvious with a 6 hour test.

Jul 23 '20 01:07 cuongdo

@philip-stoev Do we currently have any long-running load tests?

Sep 19 '22 18:09 wangandi