fio icon indicating copy to clipboard operation
fio copied to clipboard

fio takes long time to start processes

Open jcv20 opened this issue 5 years ago • 15 comments
trafficstars

Hi,

I'm trying to run a test with >500 jobs and it takes more than 20 mins for fio to start doing IO. Attached a screenshot to show where it takes time. Is this normal when launching large number of processes or anything can be done to improve. Thanks.

fio_524procs

uname -r

4.18.0-147.el8.ppc64le

cat /etc/redhat-release

Red Hat Enterprise Linux release 8.1 (Ootpa)

fio -v

fio-3.19

cat rand.fio

[global] name=randwrite ioengine=libaio iodepth=32 rw=randwrite randrepeat=0 bs=2Mi direct=1 ramp_time=0 runtime=600 time_based group_reporting

[job 1] filename=/dev/sdaee [job 2] filename=/dev/sdaes [job 3] filename=/dev/sdadf .... .... .... [job 523] filename=/dev/sdhx [job 524] filename=/dev/sdid

jcv20 avatar Apr 16 '20 03:04 jcv20

Try and add norandommap to the global section.

axboe avatar Apr 16 '20 14:04 axboe

tried norandommap. fio is still taking >20 min to start doing IO.

'# time fio /tmp/rand.fio ..... real 22m32.359s user 4m52.282s sys 21m12.710s

fio runtime is only 60s.

'# cat /tmp/rand.fio [global] name=randwrite ioengine=libaio iodepth=32 rw=randwrite randrepeat=0 bs=2Mi direct=1 ramp_time=0 runtime=60 time_based group_reporting norandommap

[job 1] filename=/dev/sdaee [job 2] filename=/dev/sdaes [job 3] filename=/dev/sdadf .... .... .... [job 523] filename=/dev/sdhx [job 524] filename=/dev/sdid

jcv20 avatar Apr 16 '20 15:04 jcv20

You can try and do a:

# perf record -ag -- sleep 5

while it's starting up, and then do:

# perf report -g --no-children

and see what is going on in the system. If it's just the one busy fio thread, which it looks like, I'd fire up top and find the busy pid, then do:

# perf record -g -p <pid from above>

and then run the same perf report on that. How big are the sdXXX devices?

axboe avatar Apr 16 '20 16:04 axboe

each device is 10T.

During startup: # perf record -ag -- sleep 5

perf_report-2

top shows one busy fio thread (pid 61705)

top

gathered trace (10sec) for that pid. Looks like it's waiting to write to memory ? and acquire/release mutex? # perf record -g -p 61705

perf_report-pid

jcv20 avatar Apr 16 '20 17:04 jcv20

For that last one, click the memset and mutex lock/unlock to get an expanded call trace, that'll give us a better idea of where it's happening.

axboe avatar Apr 16 '20 17:04 axboe

perf_report-pid_deatil

jcv20 avatar Apr 16 '20 17:04 jcv20

OK, that makes sense, it's around the iostats setup. I'll try and take a look at this.

axboe avatar Apr 16 '20 18:04 axboe

(Pinging @axboe on this one)

sitsofe avatar Aug 01 '20 14:08 sitsofe

(@axboe ping)

sitsofe avatar Jan 18 '21 08:01 sitsofe

Any update on this ? With 32 ns per drive on an NVMeOF setup and 8 drives, It takes about 4 minutes for fio to start even for numjobs=1. I'm running fio-3.28.

sekar-wdc avatar May 05 '23 19:05 sekar-wdc

(@axboe ping)

sekar-wdc avatar May 11 '23 20:05 sekar-wdc

Same for me, a 250 job test takes > 2 mins to start traffic with fio 3.28.

itayalroy avatar Jun 08 '23 23:06 itayalroy

OK, that makes sense, it's around the iostats setup. I'll try and take a look at this.

@axboe Any update on this ?

sekar-wdc avatar Sep 04 '23 17:09 sekar-wdc

OK, that makes sense, it's around the iostats setup. I'll try and take a look at this.

@axboe Any update on this ?

As a temporary work around, try running with --disk_util=0 if you don't need the disk utilization statistics.

vincentkfu avatar Sep 06 '23 11:09 vincentkfu

@

OK, that makes sense, it's around the iostats setup. I'll try and take a look at this.

@axboe Any update on this ?

As a temporary work around, try running with --disk_util=0 if you don't need the disk utilization statistics.

Thanks for this @vincentkfu ! This appears to be working much more quickly even for large I/O.

sekar-wdc avatar Sep 06 '23 17:09 sekar-wdc