dynein
dynein copied to clipboard
Create a load testing command
Load testing command
Background
The load testing command is useful in understanding DynamoDB behaviors, for example, throttling, auto-scaling, metrics, etc. Also, it helps users to investigate an application's behavior when throttling happens.
Proposed design
The decisions in the implementations are the followings;
- The amount of request traffic is controlled by leaky bucket algorithm with a feedback loop that adjusts the next amount of acquisition by actual consumed capacity.
- The current consumed capacity is updated and presented in real time. But, in the first implementation, we will omit visualization like a graph.
- To prevent consuming capacity unintentionally, RCU and WCU must be provided by the user.
- The internal request manager controls the maximum parallel request to DynamoDB. It has a responsibility to scale in or out the number of parallel requests. It scales requests exponentially with base 2.
Interface
At first implementation, load testing functionality is provided with the command, dy bench run
or dy benchmark run
and provided options are the following;
-
--rcu <number>
: Specify target RCU when reading items. This is a required argument. -
--wcu <number>
: Specify target WCU when writing items. This is a required argument if you do not provide--skip-item-createion
. -
--size <number>
: The preferred size of an attribute in bytes. The default value is500
. -
--skip-item-creation
: By default, dynein creates items first for the writing test, and then, performs the read tests by using created items. This option skips wcu testing and uses the data stored on the table. -
--partition-key-variations <number>
: The maximum number of primary key variations of items. The default value is1000
. -
--sort-key-variations <number>
: The maximum number of sort key variations of items. The default value is100
. -
--duration-write <number>
: The duration of the write testing. The default value is five minutes. -
--duration-read <number>
: The duration of the read testing. The default value is five minutes.
Common options like --table
, --region
, etc are considered as well as other commands.
We use a bench run
subcommand for initial implementation. Please note that we have room of feature enhancements. For example, we can use dy bench run -s <scenario-file>
for scenario based tests and dy bench report <report-file>
for showing a result of a test.
The workflow
The workflow of the load testing is schematically described as the followings;
- Based on the
--item-variations
argument, create a list of primary keys to use in the test. In the case in which--skip-item-creation
is provided,Scan
APIs are invoked to list primary keys. We must use parallel scans because sequential scans create a hot partition. - Based on the
--wcu
argument,PutItem
are invoked with the primary keys created by the first step for the duration of--duration-write
. An item created has an additional string attribute with--size
bytes. - Based on the
--rcu
argument,GetItem
are invoked with the primary keys created by the first step for the duration of--duration-read
.
Thank you for creating proposal of great feature.
I think other command have followed the format like dy <verb>
or dy <command> <verb>
in general. What kind of other sub command do you have rather than simple
?
I think other command have followed the format like dy
or dy in general. What kind of other sub command do you have rather than simple?
I have some ideas regarding scenario base benchmarking. I suppose it will be invoked by dy benchmark scenario
command. Its command style is the same as dy admin create table
. I understand that it is a little awkward as an English phrase, but I feel dy benchmark table simply
is a little verbose. I am willing to take in good suggestions for the command name.
I mention the YCSB command style as an option. I think it will be dy benchmark load
to load the data and dy benchmark run
to run the workload if we implement its style in dynein. The pros are compatibility with YCSB, and the cons are that we should separately run loading and testing. But I prefer dy benchmark simple
.
I think we need to provide some command like show result of load testing. I'm not sure how to run scenario base test for now. But I have two ideas that we provide simple test and scenario base test.
all in one
The idea is that we can run simple test and scenario base test within one command. If we specify the test file to run scenario base, I can imagine kind of command as follows. It is just simple interface.
# simple test
$ dy load run --rcu 100 --wcu 5
# scenario base test
$ dy load run -s <scenario-file>
# show result of load test
$ dy load report <report-file>
split command
The idea is that we provide two command load
and benchmark
. The role of each command is clearly.
# simple test
$ dy load run --rcu 100 --wcu 5
$ dy load report <report-file>
# scenario base test
$ dy benchmark run <scenario-file>
$ dy benchmark report <report-file>
I personally find the -s
option to be a clear and effective way of specifying scenario-based testing. Also, it makes sense to split the run
and report
commands. Thank you for your suggestion. However, I'm a bit concerned that the load
argument might confuse users since it has multiple meanings. In other words, I worry that users might mix up loading the data and loading DynamoDB for stress testing.
In my opinion, using the term benchmark
(maybe even a shorter version like bench
) would be clearer than load. What do you think?
Additionally, I would like to propose the following commands:
# Perform a simple test
$ dy bench run --rcu 100 --wcu 5
# Conduct a scenario-based test (not implemented in the initial phase)
$ dy bench run -s <scenario-file>
# Generate a report for the load test (not implemented in the initial phase)
$ dy bench report <report-file>
Please let me know what you think about these suggestions and proposed commands.
I agree with you. The idea that using benchmark
or bench
instead of load
is clearly and easy to understand.
Based on the internal discussion with Solution Architect, the following features are preferable.
- Specify the maximum number of concurrent requests instead of RCU and WCU.
- Specify primary keys to use load testing.
He want similar functionality as what the following project provides. https://github.com/aws-samples/dynamodb-consumed-capacity-check-tool