seastar icon indicating copy to clipboard operation
seastar copied to clipboard

io_tester: implement request_type::unlink

Open pwrobelse opened this issue 1 year ago • 4 comments

This change introduces request_type::unlink as well as unlink_class_data to io_tester. It also extends the creation of operations to be executed to recognize the new request type.

The purpose of this change is to enable io_tester to analyze the impact of unlink operations on read and write operations.

unlink_class_data creates a given number of files during startup. When requests are issued it calls unlink on the created files.

Refs: scylladb#1299

pwrobelse avatar Feb 01 '24 12:02 pwrobelse

This change is only a draft. It needs to be properly tested. Also, further description of unlink_class_data will be added to the commit message.

pwrobelse avatar Feb 01 '24 12:02 pwrobelse

Hi @xemul, the PR has been adjusted and manually tested. It introduces request_type::unlink as well as unlink_class_data. The new job type works as follows:

  1. It receives files_count parameter via configuration and creates the given number of files during startup phase, before the evaluation starts. Each created file is filled with dummy data and its size is equal to _config.file_size / files_count to match the disk's memory usage specified by data_size. The files are created in parallel - the number of running parallel instances is limited to avoid hitting open file descriptors limit that would cause an exception.
  2. During the evaluation it calls seastar::remove_file() for each call to unlink_class_data::issue_request() as long as there is an available file to be removed. When all files are removed it returns immediately.
  3. Because unlink_class_data is derived from class_data, the rps and think_time can be specified to alter the frequency of calls to unlink.

Aside from that this change adjusts the documentation of io_tester as well as an example configuration to cover the new changes. It also removes the obsolete information from the docs.

The code was tested manually. A few cases were covered:

  1. Firstly, the creation of files was tested. To verify that it is correct, the code of unlink_class_data::issue_request() was altered to skip the removal of files. This way, after the execution I was able to inspect that files were created and that their content is valid.
  2. Secondly, the basic request_type::unlink was tested. The configuration did not cover any rps or think_time. The job correctly removed the files via unlink_class_data::issue_request().
  3. Thirdly, rps and think_time were tested. When I specified each of them separately, I was able to see the difference in latency reported by unlink_class_data. Additionally, when I specified short duration of test, huge count of files and long think_time I was able to see that the test was not able to remove all files during the specified short duration. This proves that the number of executed requests was limited - without that setup the test was able to remove all files.

pwrobelse avatar Feb 07 '24 12:02 pwrobelse

Can you also put here some measurement results, e.g. -- read workload run on its own vs read workload run in parallel with unlink one to see how unlinking affects read latency

xemul avatar Feb 12 '24 10:02 xemul

Hi @xemul, please find the adjustments introduced by the new patch-set:

  1. The common part related to file creation has been extracted from io_data_class and unlink_data_class to the new free function called create_and_fill_file(). It returns file handle and last write position.
  2. A warning is now printed by unlink_class_data::issue_request() when all files have been removed. It means, that the request cannot be fulfilled.
  3. More fields are now printed via unlink_data_class::emit_results() including IOPS, average/max latencies and total number of requests.
  4. A new member function called stop_hook() has been added to class_data. By default it is empty, but the derived classes may override it. Such new member function is used to inject a cleanup routine, that is specific for the derived class.
  5. Added stop_hook() implementation for unlink_data_class to remove files, if any of them remain, when keep_files == false. This way the tested directory is clean when the execution finishes.

pwrobelse avatar Feb 13 '24 11:02 pwrobelse

Can you also put here some measurement results, e.g. -- read workload run on its own vs read workload run in parallel with unlink one to see how unlinking affects read latency

Hi @xemul, please find some measurements.

Environment property Value
Machine i3.xlarge
OS Ubuntu 22.04
Test duration 20s
SMP 4

Compared configurations:

(1) randread (2) randread+unlink_v1 (3) randread+unlink_v2
read data_size 20GB 20GB 20GB
read reqsize 512 512 512
read parallelism 1 1 1
read shares 100 100 100
unlink file size N/A 1MB 1MB
unlink think_time N/A 100us 100us
unlink parallelism N/A 5 10

Results (shard0):

(1) randread (2) randread+unlink_v1 (3) randread+unlink_v2
throughput [kB/s] 4543.79492 2739.72925 2730.54761
IOPS 9087.58984 5479.50879 5461.09521
avg latency 108us 179us 179us
p0.5 105us 122us 119us
p0.95 130us 436us 519us
p0.99 146us 761us 914us
p0.999 181us 1430us 1713us
max latency 4314us 37647us 29873us
total_requests 181752 109591 109222
io_queue_total_exec_sec 18.872 15.855 15.990
io_queue_total_delay_sec 0.516 2.536 2.296
io_queue_total_operations 181753 109592 109223
io_queue_starvation_time_sec 0.504 2.486 2.248
io_queue_consumption 0.022 0.0132 0.0130
io_queue_adjusted_consumption 0.0008 0.0058 0.0056

The workloads that use also unlink operations show that the latency is higher. p0.95 is ~4 times higher, p0.99 is ~8 times higher.

pwrobelse avatar Feb 28 '24 08:02 pwrobelse