fio icon indicating copy to clipboard operation
fio copied to clipboard

Fix IO might be hanging in batch completion storage system.

Open lausaa opened this issue 3 years ago • 7 comments

Some storage system could not complete the IOs until an entire batch of IOs have been submitted. Therefore, the in-flight IOs will be hanging in io_u_quiesce if they are less than a batch, and the next IO cannot be scheduled. For this case, the option iodepth_batch_complete_omit for the number of in-flight IOs that need not be retrieved, could be used to break the hanging.

lausaa avatar Sep 08 '22 12:09 lausaa

I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting?

ghost avatar Sep 08 '22 14:09 ghost

I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting?

This is a tiered system. IO requests write to the SSD firstly, and dump to the backend HDD on async mode. Generally it is based on two conditions:

  1. a certain amount of total size;
  2. timeout;

One of the above conditions can trigger to dump. It prefers a continuous stream, and the timeout is set to larger seconds... Thus, I think it need a feature that ignore the IO completion rates to some extent when using rate limit. I understand the io_u_quiesce is for the in-flight IO latency considered. But there may be relatively large jitter when we use rate limit in a linear fashion with fixed delays. Uhhh... I suddenly realized that maybe we can do rate limit through the token method. :-)

lausaa avatar Sep 09 '22 02:09 lausaa

I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting?

@axboe I've read this patch, and find it maybe useful in some case:

  1. We expect to merge some IOs to write together, for better ssd/hdd performance.
  2. We are wiling to compute Erasure code, and need to merge some IOs for stripe length.

Yes, We could set lower batch for this merge. But it may effect the purpose to get better batch IO bandwidth and EC penalty.

So how do you the the mechanism behind io_u_quiesce for such cases? do you suggest we need another option to control? If not, could you give some guidance how to handle it?

zhouyong-zy avatar Sep 11 '22 03:09 zhouyong-zy

We expect to merge some IOs to write together, for better ssd/hdd performance.

But then you expect all of those to complete at the same time anyway, so it won't really change much (if anything) in terms of how the device behaves in completions.

We are wiling to compute Erasure code, and need to merge some IOs for stripe length.

But that happens behind the scenes, not IO that fio has issued nor would be waiting for.

I think I'd have less of a problem with this patch if it was more intuitive what it does. It's also not documented at all in fio.1 or the HOWTO. On top of that, it has the potential to skew timing as well. We're not quiescing anymore, only partially, if that is set.

axboe avatar Sep 11 '22 07:09 axboe

On top of that, it has the potential to skew timing as well. We're not quiescing anymore, only partially, if that is set.

Yes, you are right. I think this patch could make the rate limit more even and stable, but at the cost of skewing some of the latency, if it is set. But in our cases, maybe we have no a better way to neither skew the latency nor hang for a long time on quiescing. In other words, this option is suitable for the scenarios that are insensitive to latency and prefer a smooth data flow. I'll document these.

lausaa avatar Sep 12 '22 09:09 lausaa

@axboe I have updated the fio.1 and HOWTO. But could the changes to the documents cause the CI AppVeyor to fail? -- CI test is successful after a forced push of a typo fix. ;-))

lausaa avatar Sep 14 '22 02:09 lausaa

Could we proceed with this PR? @axboe

lausaa avatar Sep 16 '22 02:09 lausaa