fio
fio copied to clipboard
ramp_time restarts job?
Please acknowledge the following before creating a ticket
- [x] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
I am using fio to measure read latency for various combinations of read and write workloads (see diskplorer. To to that, I generate job files with rate-limited read and write jobs. Since I don't know the latency, I don't know the read concurrency in advance, and so I set it to 1000 per thread (and I run on thread per vcpu).
The problem is that this generates a large load when the job starts, as it pushes tons of requests to disk. That's fine, fio is learning the disk latency and will adjust concurrency when it does. I set ramp_time=5 to remove this learning period from measurements.
The problem is that I see the same spike after ramp_time elapsed, see vmstat output below. It starts with a spike in bi
, settles for 5 seconds (ramp_time), then spikes and settles again. I'd really like to avoid the spike when ramp_time expires!
Note also the spike in procs/r, probably workqueues from the too-high concurrency.
Sample job file: 0037.fio.
I didn't see any evidence in the code that the job is restarted on the transition from TD_RAMP to TD_RUNNING, but something is clearly happening.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 460590368 33728 847720 0 0 100491 216832 81644 1434664 41 37 8 14 0
2 0 0 460592704 33728 847720 0 0 38098 242176 152471 275131 1 0 99 0 0
0 0 0 460592448 33728 847720 0 0 38250 224384 153417 276436 1 0 99 0 0
1 0 0 460592448 33728 847720 0 0 38119 224384 153126 275687 1 0 99 0 0
0 0 0 460592512 33728 847720 0 0 38217 224384 153707 276820 1 0 99 0 0
812 0 0 460678144 33728 847720 0 0 61199 232576 161072 303737 1 4 95 0 0
2 0 0 460888576 33728 847592 0 0 38863 224256 152235 278462 1 1 99 0 0
1 0 0 460888576 33728 847592 0 0 38266 224384 153190 276809 1 0 99 0 0
0 0 0 460888576 33728 847592 0 0 38310 224384 153746 276808 1 0 99 0 0
1 0 0 460888064 33728 847592 0 0 38428 224256 154097 278023 1 0 99 0 0
0 0 0 460891360 33728 847592 0 0 38230 224384 153005 277122 1 0 99 0 0
1 0 0 460902560 33728 847592 0 0 38226 224256 153480 277212 1 0 99 0 0
1 0 0 460902560 33728 847592 0 0 38190 224384 153741 276527 1 0 99 0 0
2 0 0 460902560 33728 847592 0 0 38223 224256 154076 277180 1 0 99 0 0
0 0 0 460902560 33728 847592 0 0 38350 224384 154024 277097 1 0 99 0 0
0 0 0 460902560 33728 847592 0 0 38191 224256 153429 276048 1 0 99 0 0
1 0 0 460902560 33728 847592 0 0 38235 224384 154030 276920 1 0 99 0 0
0 0 0 460902560 33728 847592 0 0 38219 224256 153740 276978 1 0 99 0 0
2 0 0 460902560 33728 847592 0 0 37812 224384 152749 274209 1 0 99 0 0
2 0 0 460902560 33728 847592 0 0 38299 224384 154229 277121 1 0 99 0 0
1 0 0 460902560 33728 847592 0 0 38251 224256 154176 276797 1 0 99 0 0
0 0 0 460902560 33728 847592 0 0 38316 224384 154026 277576 1 0 99 0 0
1 0 0 460902560 33736 847584 0 0 38201 224328 153716 276666 1 0 99 0 0
0 0 0 460902336 33736 847592 0 0 38191 224384 152988 276083 1 0 99 0 0
0 0 0 460902336 33736 847592 0 0 38362 224256 154207 278006 1 0 99 0 0
4 0 0 460902304 33736 847592 0 0 38253 224384 154268 277444 1 0 99 0 0
1 0 0 460902304 33736 847592 0 0 38109 224256 154200 276726 1 0 99 0 0
2 0 0 460902304 33736 847592 0 0 38283 224384 154299 277057 1 0 99 0 0
3 0 0 460902304 33736 847592 0 0 38071 224256 153125 275599 1 0 99 0 0
2 0 0 460902304 33736 847592 0 0 38537 224384 154522 278874 1 0 99 0 0
0 0 0 460902304 33736 847592 0 0 38089 224256 153406 276324 1 0 99 0 0
2 0 0 460902304 33736 847592 0 0 38015 224384 153100 275747 1 0 99 0 0
2 0 0 460902304 33736 847592 0 0 38326 224396 154099 277242 1 0 99 0 0
1 0 0 460908864 34804 847532 0 0 39562 123648 153565 276990 1 0 99 0 0
0 0 0 460908736 34804 847592 0 0 38165 0 151048 271587 1 0 99 0 0
1 0 0 490447424 33736 847652 0 0 11484 0 46754 82201 0 1 98 0 0
0 0 0 518314752 33736 847072 0 0 1079 28 5266 3799 2 1 96 0 0
63 0 0 490278176 33736 847444 0 0 0 175360 18674 26948 3 85 12 0 0
Environment: Ubuntu 21.04
fio version: fio-3.25
Reproduction steps
Run the example fio above, adjust for your cpu count and disk characteristics.
Here's a more extreme example, where read_iops=10:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 517080608 37404 1092184 0 0 1813 279424 10488 104071 2 14 83 0 0
0 0 0 517080608 37404 1092184 0 0 5 448640 7428 14379 0 0 100 0 0
0 0 0 517080608 37404 1092184 0 0 4 448640 7411 14354 0 0 100 0 0
1 0 0 517080608 37404 1092184 0 0 6 448768 7472 14463 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 7 448640 7447 14421 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 784 448896 7588 14546 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 6 448768 7414 14344 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 4 448640 7456 14430 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 6 448640 7430 14399 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 6 448768 7393 14349 0 0 100 0 0
0 0 0 517081600 37404 1092184 0 0 5 448768 7467 14382 0 0 100 0 0
0 0 0 517081600 37412 1092176 0 0 5 449024 7492 14441 0 0 100 0 0
0 0 0 517081600 37412 1092176 0 0 5 448640 7425 14364 0 0 100 0 0
0 0 0 517081600 37412 1092176 0 0 5 448768 7394 14325 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 3 448768 7401 14320 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 3 448768 7475 14431 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 6 448640 7427 14364 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 8 448768 7433 14373 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 5 448768 7416 14336 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 6 448640 7479 14445 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 4 448768 7407 14360 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 4 448768 7399 14328 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 4 448640 7395 14328 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 4 448896 7468 14424 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 9 448640 7430 14388 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 5 448640 7396 14327 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 6 448768 7401 14340 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 3 448768 7460 14419 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 4 448640 7419 14358 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 7 448640 7404 14343 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 7 448768 7427 14352 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 7 448768 7489 14455 0 0 100 0 0
1 0 0 517081600 37412 1092184 0 0 6 448772 7438 14375 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 3 448640 7397 14317 0 0 100 0 0
0 0 0 517081600 37412 1092184 0 0 2 448768 7459 14408 0 0 100 0 0
0 0 0 517980544 37412 1108832 0 0 2147 164372 4894 7121 3 0 97 0 0
63 0 0 487100480 37412 1108120 0 0 0 375552 22368 34689 3 90 6 0 0
63 0 0 460675520 37412 1108120 0 0 568 448896 23612 14902 22 76 2 0 0
Reads start crazy and settle down, then go crazy again after 5 seconds.
[global]
runtime=30s
time_based=1
ramp_time=5
filename=/dev/nvme1n1
direct=1
group_reporting
ioengine=io_uring
size=0
random_generator=tausworthe64
randrepeat=0
thread
[job(r_idx=0,w_idx=4,write_bw=459345240,r_iops=0)]
stonewall
new_group
readwrite=write
blocksize=131072
iodepth=64
rate=459345240
[job(r_idx=0,w_idx=4,write_bw=459345240,r_iops=0).1]
readwrite=randread
blocksize=512
iodepth=1000
rate_iops=10
rate_process=poisson
I would start looking at time.c:ramp_time_over()
bool ramp_time_over(struct thread_data *td)
{
if (!td->o.ramp_time || td->ramp_time_over)
return true;
if (utime_since_now(&td->epoch) >= td->o.ramp_time) {
td->ramp_time_over = true;
reset_all_stats(td);
reset_io_stats(td);
td_set_runstate(td, TD_RAMP);
/*
* If we have a parent, the parent isn't doing IO. Hence
* the parent never enters do_io(), which will switch us
* from RAMP -> RUNNING. Do this manually here.
*/
if (parent_update_ramp(td))
td_set_runstate(td, TD_RUNNING);
return true;
}
return false;
}
In there are calls to reset_all_stats()
and reset_io_stats()
. Perhaps counters related to rate limiting are being reset and preserving them would resolve this problem.