seastar io_tester: implement request

This change introduces request_type::unlink as well as unlink_class_data to io_tester. It also extends the creation of operations to be executed to recognize the new request type.

The purpose of this change is to enable io_tester to analyze the impact of unlink operations on read and write operations.

unlink_class_data creates a given number of files during startup. When requests are issued it calls unlink on the created files.

Refs: scylladb#1299

Feb 01 '24 12:02 pwrobelse

This change is only a draft. It needs to be properly tested. Also, further description of unlink_class_data will be added to the commit message.

Feb 01 '24 12:02 pwrobelse

Hi @xemul, the PR has been adjusted and manually tested. It introduces request_type::unlink as well as unlink_class_data. The new job type works as follows:

It receives files_count parameter via configuration and creates the given number of files during startup phase, before the evaluation starts. Each created file is filled with dummy data and its size is equal to _config.file_size / files_count to match the disk's memory usage specified by data_size. The files are created in parallel - the number of running parallel instances is limited to avoid hitting open file descriptors limit that would cause an exception.
During the evaluation it calls seastar::remove_file() for each call to unlink_class_data::issue_request() as long as there is an available file to be removed. When all files are removed it returns immediately.
Because unlink_class_data is derived from class_data, the rps and think_time can be specified to alter the frequency of calls to unlink.

Aside from that this change adjusts the documentation of io_tester as well as an example configuration to cover the new changes. It also removes the obsolete information from the docs.

The code was tested manually. A few cases were covered:

Firstly, the creation of files was tested. To verify that it is correct, the code of unlink_class_data::issue_request() was altered to skip the removal of files. This way, after the execution I was able to inspect that files were created and that their content is valid.
Secondly, the basic request_type::unlink was tested. The configuration did not cover any rps or think_time. The job correctly removed the files via unlink_class_data::issue_request().
Thirdly, rps and think_time were tested. When I specified each of them separately, I was able to see the difference in latency reported by unlink_class_data. Additionally, when I specified short duration of test, huge count of files and long think_time I was able to see that the test was not able to remove all files during the specified short duration. This proves that the number of executed requests was limited - without that setup the test was able to remove all files.

Feb 07 '24 12:02 pwrobelse

Can you also put here some measurement results, e.g. -- read workload run on its own vs read workload run in parallel with unlink one to see how unlinking affects read latency

Feb 12 '24 10:02 xemul

Hi @xemul, please find the adjustments introduced by the new patch-set:

The common part related to file creation has been extracted from io_data_class and unlink_data_class to the new free function called create_and_fill_file(). It returns file handle and last write position.
A warning is now printed by unlink_class_data::issue_request() when all files have been removed. It means, that the request cannot be fulfilled.
More fields are now printed via unlink_data_class::emit_results() including IOPS, average/max latencies and total number of requests.
A new member function called stop_hook() has been added to class_data. By default it is empty, but the derived classes may override it. Such new member function is used to inject a cleanup routine, that is specific for the derived class.
Added stop_hook() implementation for unlink_data_class to remove files, if any of them remain, when keep_files == false. This way the tested directory is clean when the execution finishes.

Feb 13 '24 11:02 pwrobelse

Can you also put here some measurement results, e.g. -- read workload run on its own vs read workload run in parallel with unlink one to see how unlinking affects read latency

Hi @xemul, please find some measurements.

Environment property	Value
Machine	i3.xlarge
OS	Ubuntu 22.04
Test duration	20s
SMP	4

Compared configurations:

	(1) randread	(2) randread+unlink_v1	(3) randread+unlink_v2
read data_size	20GB	20GB	20GB
read reqsize	512	512	512
read parallelism	1	1	1
read shares	100	100	100
unlink file size	N/A	1MB	1MB
unlink think_time	N/A	100us	100us
unlink parallelism	N/A	5	10

Results (shard0):

	(1) randread	(2) randread+unlink_v1	(3) randread+unlink_v2
throughput [kB/s]	4543.79492	2739.72925	2730.54761
IOPS	9087.58984	5479.50879	5461.09521
avg latency	108us	179us	179us
p0.5	105us	122us	119us
p0.95	130us	436us	519us
p0.99	146us	761us	914us
p0.999	181us	1430us	1713us
max latency	4314us	37647us	29873us
total_requests	181752	109591	109222
io_queue_total_exec_sec	18.872	15.855	15.990
io_queue_total_delay_sec	0.516	2.536	2.296
io_queue_total_operations	181753	109592	109223
io_queue_starvation_time_sec	0.504	2.486	2.248
io_queue_consumption	0.022	0.0132	0.0130
io_queue_adjusted_consumption	0.0008	0.0058	0.0056

The workloads that use also unlink operations show that the latency is higher. p0.95 is ~4 times higher, p0.99 is ~8 times higher.

Feb 28 '24 08:02 pwrobelse

seastar
seastar copied to clipboard

io_tester: implement request_type::unlink

seastar seastar copied to clipboard

io_tester: implement request_type::unlink

seastar
seastar copied to clipboard