fio icon indicating copy to clipboard operation
fio copied to clipboard

write_iops_log is incorrect when device is log_avg_msec is less than 1000

Open louwrentius opened this issue 4 years ago • 10 comments

Hello,

I run a benchmark with --numjobs 16 and higher on a storage device. In this case a single harddrive, overloading it but just to understand and observe the behaviour.

  • The .json output shows correct bandwidth numbers.
  • The bandwidth of all log files added together exceeds the reported bandwidth in the .json
  • The bandwidth as derived from the log files defies the laws of physics: the drive can't do more than 115 MB/s.
  • I've never observed higher bandwidths than around 110 ~ 115 MB/s with 'dstat' during benchmarking

I've used my fio-plot tool to chart the results:

possiblebug01

I've included the raw benchmark output too.

bugtest.tar.gz

Fio version: fio-3.24-8-gd0416

I'm quite sure I've seen this behaviour in older versions too.

In this image you can observe that the data makes sense up until --numjobs 8, but the --numjobs 16 seems wrong.

bugtest02

I'm quite convinced it's not fio-plot as I checked the contents of the log files manually.

If I overlooked something or made a mistake, my apologies in advance.

louwrentius avatar Dec 09 '20 00:12 louwrentius

Thanks for the report, bug you neglected to include the most important information (job fie, fio version, etc). Please read:

https://git.kernel.dk/cgit/fio/tree/REPORTING-BUGS

axboe avatar Dec 09 '20 00:12 axboe

Tested Platform root@ubuntu:/fio/fio-plot/benchmark_script# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic

Fio version fio-3.24-8-gd0416

Fio parameters

['fio',
 '--output-format=json',
 '--output=bugtest/directory/1M/read-1-16.json',
 './fio-job-template.fio',
 '--size=10G',
 '--runtime=60',
 '--time_based',
 '--directory=/benchmark/directory']

fio-job-template.fio

[iotest]
rw=${MODE} --> READ
blocksize=${BLOCK_SIZE} --> 1M
ioengine=${ENGINE} --> libaio
iodepth=${IODEPTH} --> varies over the test runs
numjobs=${NUMJOBS} --> varies ovr the test runs
direct=${DIRECT} --> 1
group_reporting=1
invalidate=${INVALIDATE} --> 1
loops=${LOOPS} --> 1
write_bw_log=${OUTPUT}/${MODE}-iodepth-${IODEPTH}-numjobs-${NUMJOBS}
write_lat_log=${OUTPUT}/${MODE}-iodepth-${IODEPTH}-numjobs-${NUMJOBS}
write_iops_log=${OUTPUT}/${MODE}-iodepth-${IODEPTH}-numjobs-${NUMJOBS}
log_avg_msec=${LOGINTERVAL} --> 500 

The benchmarks are performed with my own benchmark script that is part of fio-plot but I hope that doesn't matter here. The command line parameter list is just one particular benchmark, I've run multiple to generate this data where only queue depth and numjobs values are different.

All this data is included in the tar.gz I included earlier.

Possible Cause The machine I used for testing has only two CPU-cores and itself may have been overloaded with so many jobs reading at once. However, I would expect Fio to just abort in that case.

Please let me know if you need any additional information.

louwrentius avatar Dec 09 '20 00:12 louwrentius

@louwrentius Do things look any different with a log_avg_msec of 1000?

sitsofe avatar Dec 09 '20 07:12 sitsofe

Yes, it does seem right now.

Test_2020-12-09_115817

What do you make of it? Sounds like the I/O is affected by so much latency it doesn't fit in the 500ms interval.

Edit: do you still agree that this behaviour is a bug?

louwrentius avatar Dec 09 '20 11:12 louwrentius

@louwrentius:

What do you make of it? Sounds like the I/O is affected by so much latency it doesn't fit in the 500ms interval.

I'm not sure - I'd have to investigate because I'm not familiar with that particular code but I think I had questions about it hence the reason I asked you to try a different value.

do you still agree that this behaviour is a bug?

See above - I'd have to investigate. Imagine the period is 500ms, if I see 10 I/Os in the first 500ms period and 10 I/Os in the second 500ms period what are my IOPS? I would have said 20 IOPS but to do that I have to attribute both those samples to the "same" second. What if the period is inconvenient (e.g. 733ms) - what should happen? etc. Further, it might lead to strange results if I just extrapolate periods under a second because you don't know what happened in the other part of the second ("I saw 10 I/Os in 500ms so therefore I predict my IOPS is 20").

sitsofe avatar Dec 10 '20 07:12 sitsofe

Thank you for this explanation.

Now that I do understand this issue better, and given the very exceptional circumstances of this 'bug', I would find it reasonable to not resolve it or give it a very low priority.

Maybe it would be nice to detect 'bad' data and advice Fio users to run the benchmark again with log_avg_msec of 1000. I can't tell if that's more work than actually solving this issue but that's up to you.

louwrentius avatar Dec 11 '20 00:12 louwrentius

Do the IOPS look OK when log_avg_msec is 2000 (or any value above 1000)?

sitsofe avatar Dec 11 '20 18:12 sitsofe

If I test with log_avg_msec 1000 and 2000 (tested both) everything seems OK. Only the 500 msec data seems off. I've checked the data by cross referencing the log data (using excel) with the numbers in the .json file.

I've excluded my fio-plot tool to review the data, to exclude it as a possible source of problems. The raw benchmark data is attached. I hope this all helps.

bugtest500_1000_2000.tar.gz

louwrentius avatar Dec 12 '20 00:12 louwrentius

Thanks for checking @louwrentius. Thinking more I believe the issue is down to averaging. The following question illustrates this:

"If five 200ms periods report their 'IOPS' as 10, what are my IOPS across that second?"

Each period is being averaged but ends up projecting to achieve an IOPS number for itself. However as I asked for sub-second periods, I am no longer in a position to directly use the IOPS results. As a result I have to do one of the following for the value to make sense:

  • I have to say "I am plotting I/Os per one fifth of a second" (rather than "I/Os per second"), divide each value by five and plot at appropriate intervals
  • I have to collect and average out each sample that fell within the same second (so with the above (10 + 10 + 10 + 10 + 10) / 5 = 10 IOPS) and plot at second intervals

Values equal to or greater than a second work because the averaging doesn't happen multiple times in one second.

Given the above how about a documentation warning like:

Using log_avg_msec values of less than a second can result in more than one projected IOPS result for each second. It is up to the user to compensate for this or otherwise use a log_avg_msec value of 1000 or greater.

If that looks OK what part of the documentation should it be added to? https://fio.readthedocs.io/en/latest/fio_man.html#cmdoption-arg-log-avg-msec , https://fio.readthedocs.io/en/latest/fio_man.html#cmdoption-arg-write-iops-log or somewhere else?

sitsofe avatar Dec 12 '20 11:12 sitsofe

Hello, I think that the proposed section for the manual should be sufficient.

If true, maybe it may put people at ease if you tell them that this should not be a problem unless you are going to stress a storage device to their limits, causing very slow I/O completion (beyond, the specified log interval). Just to be clear: I don't understand this enough to be sure that this claim can be made.

I would propose that the statement is added to the log-avg-msec section and the write-iops-log and write-bw-log sections may want to refer to that section, if that fits.

https://fio.readthedocs.io/en/latest/fio_man.html#cmdoption-arg-write-bw-log

I was reading your examples, and I was reading the manual again and finally I understand the actual problem.

It's called the write_iops_log for a reason. This whole issue should not exist if it would be called write_io_log. However, that kind of log is not easy to understand/interpret by a human being that thinks in IOPs. As in 2000 I/Os in 200msec should translate to 10K IOPs.

You could think about a command line switch that doesn't log in IOPs but in I/Os per window and let the Fio-users handle the conversions. It's entirely up to you to decide if that's worth pursuing (or just creates more problems), I don't know if the Fio-userbase would have any interest in it.

louwrentius avatar Dec 12 '20 18:12 louwrentius