dynein icon indicating copy to clipboard operation
dynein copied to clipboard

Create a load testing command

Open StoneDot opened this issue 1 year ago • 7 comments

Load testing command

Background

The load testing command is useful in understanding DynamoDB behaviors, for example, throttling, auto-scaling, metrics, etc. Also, it helps users to investigate an application's behavior when throttling happens.

Proposed design

The decisions in the implementations are the followings;

  • The amount of request traffic is controlled by leaky bucket algorithm with a feedback loop that adjusts the next amount of acquisition by actual consumed capacity.
  • The current consumed capacity is updated and presented in real time. But, in the first implementation, we will omit visualization like a graph.
  • To prevent consuming capacity unintentionally, RCU and WCU must be provided by the user.
  • The internal request manager controls the maximum parallel request to DynamoDB. It has a responsibility to scale in or out the number of parallel requests. It scales requests exponentially with base 2.

Interface

At first implementation, load testing functionality is provided with the command, dy bench run or dy benchmark run and provided options are the following;

  • --rcu <number>: Specify target RCU when reading items. This is a required argument.
  • --wcu <number>: Specify target WCU when writing items. This is a required argument if you do not provide --skip-item-createion.
  • --size <number>: The preferred size of an attribute in bytes. The default value is 500.
  • --skip-item-creation: By default, dynein creates items first for the writing test, and then, performs the read tests by using created items. This option skips wcu testing and uses the data stored on the table.
  • --partition-key-variations <number>: The maximum number of primary key variations of items. The default value is 1000.
  • --sort-key-variations <number>: The maximum number of sort key variations of items. The default value is 100.
  • --duration-write <number>: The duration of the write testing. The default value is five minutes.
  • --duration-read <number>: The duration of the read testing. The default value is five minutes.

Common options like --table, --region, etc are considered as well as other commands.

We use a bench run subcommand for initial implementation. Please note that we have room of feature enhancements. For example, we can use dy bench run -s <scenario-file> for scenario based tests and dy bench report <report-file> for showing a result of a test.

The workflow

The workflow of the load testing is schematically described as the followings;

  1. Based on the --item-variations argument, create a list of primary keys to use in the test. In the case in which --skip-item-creation is provided, Scan APIs are invoked to list primary keys. We must use parallel scans because sequential scans create a hot partition.
  2. Based on the --wcu argument, PutItem are invoked with the primary keys created by the first step for the duration of --duration-write. An item created has an additional string attribute with --size bytes.
  3. Based on the --rcu argument, GetItem are invoked with the primary keys created by the first step for the duration of --duration-read.

StoneDot avatar Jun 18 '23 14:06 StoneDot

Thank you for creating proposal of great feature.

I think other command have followed the format like dy <verb> or dy <command> <verb> in general. What kind of other sub command do you have rather than simple?

ryota-sakamoto avatar Jun 20 '23 14:06 ryota-sakamoto

I think other command have followed the format like dy or dy in general. What kind of other sub command do you have rather than simple?

I have some ideas regarding scenario base benchmarking. I suppose it will be invoked by dy benchmark scenario command. Its command style is the same as dy admin create table. I understand that it is a little awkward as an English phrase, but I feel dy benchmark table simply is a little verbose. I am willing to take in good suggestions for the command name.

StoneDot avatar Jun 20 '23 15:06 StoneDot

I mention the YCSB command style as an option. I think it will be dy benchmark load to load the data and dy benchmark run to run the workload if we implement its style in dynein. The pros are compatibility with YCSB, and the cons are that we should separately run loading and testing. But I prefer dy benchmark simple.

StoneDot avatar Jun 20 '23 16:06 StoneDot

I think we need to provide some command like show result of load testing. I'm not sure how to run scenario base test for now. But I have two ideas that we provide simple test and scenario base test.

all in one

The idea is that we can run simple test and scenario base test within one command. If we specify the test file to run scenario base, I can imagine kind of command as follows. It is just simple interface.

# simple test
$ dy load run --rcu 100 --wcu 5

# scenario base test
$ dy load run -s <scenario-file>

# show result of load test
$ dy load report <report-file>

split command

The idea is that we provide two command load and benchmark. The role of each command is clearly.

# simple test
$ dy load run --rcu 100 --wcu 5
$ dy load report <report-file>
# scenario base test
$ dy benchmark run <scenario-file>
$ dy benchmark report <report-file>

ryota-sakamoto avatar Jun 23 '23 17:06 ryota-sakamoto

I personally find the -s option to be a clear and effective way of specifying scenario-based testing. Also, it makes sense to split the run and report commands. Thank you for your suggestion. However, I'm a bit concerned that the load argument might confuse users since it has multiple meanings. In other words, I worry that users might mix up loading the data and loading DynamoDB for stress testing.

In my opinion, using the term benchmark (maybe even a shorter version like bench) would be clearer than load. What do you think?

Additionally, I would like to propose the following commands:

# Perform a simple test
$ dy bench run --rcu 100 --wcu 5

# Conduct a scenario-based test (not implemented in the initial phase)
$ dy bench run -s <scenario-file>

# Generate a report for the load test (not implemented in the initial phase)
$ dy bench report <report-file>

Please let me know what you think about these suggestions and proposed commands.

StoneDot avatar Jun 26 '23 14:06 StoneDot

I agree with you. The idea that using benchmark or bench instead of load is clearly and easy to understand.

ryota-sakamoto avatar Jun 27 '23 08:06 ryota-sakamoto

Based on the internal discussion with Solution Architect, the following features are preferable.

  • Specify the maximum number of concurrent requests instead of RCU and WCU.
  • Specify primary keys to use load testing.

He want similar functionality as what the following project provides. https://github.com/aws-samples/dynamodb-consumed-capacity-check-tool

StoneDot avatar Sep 29 '23 14:09 StoneDot