Accelerator's count and read_thread's count were used as criteria for sweep evaluation, but the results are confusing.
I ran evaluations while varying the number of accelerators and the number of read threads. I expected the I/O throughput to scale proportionally with accelerator × read_thread, but as shown below the performance drops dramatically when I use five accelerators. Even if I keep increasing the number of accelerators and read threads, I thought the system would eventually hit a bottleneck, yet instead of a gradual bottleneck the performance collapses.
For the tests I only added the read_thread option to the default run command you provided, and before each run I re‑formatted the NVMe device and remounted the XFS filesystem (the initialization step).
Could you help me figure out why I’m getting these results? Is there something wrong with my setup?