f3
f3 copied to clipboard
Identifying drives that starve the rest for IO
I'm using a custom Bash script to deploy f3 for massive drive testing (thousands per month).
One problem I have is some faulty drives will cause the rest to stall. I have finally found a way to identify (and remove) them to restore the peace in the Galaxy - so I thought I'll share my research:
https://unix.stackexchange.com/questions/484900/detecting-misbehaving-usb-flash-memory-devices-that-block-the-rests-io
Thank you for posting your results here.
You script is forking 27+ new processes per loop (every second), and doing I/0 on 5+ files per loop. That doesn't seem very efficient.
Here is a more efficient approach, written in Lua 5.3. It forks only two processes per loop, and doesn't create any files.
It will only show the total number of "idle" drives (devices with no active I/O), the total number of all drives, the device name of the "hog", and the stat value for that device.
Note that this is intended for Lua 5.3 (older versions might work, but Ctrl+C might not kill it properly)
#!/usr/bin/env lua
-- Edit the wildcard if you need a more specific match
local wildcard='sd*'
local old_stats={} -- saved stats from previous loop
-- Capture pattern: Skips the first nine fields, captures fields #10 and #11
local pat=string.rep('%d+%s+',9)..'(%d+)%s+(%d+)'
-- String of spaces (to clear previous output before writing new data)
local clear=string.rep(' ',64)..'\r'
-- Output string format, prints the number of idle drives / total drives,
-- the device name of the hog, and its current status.
local fmt=' IDLE: %d/%d MAX:%s (%d)\r'
-- Begin infinite loop (until forever or Ctrl+C)
repeat
local find=io.popen("find /sys/block/ -maxdepth 1 -name " ..wildcard)
local stats={} -- Start with empty table
for dir in find:lines() do
dev=dir:gsub('^.*/','') -- Name of device,less path
local stat=io.open(dir..'/stat') -- Open stat file
for line in stat:lines() do
-- Match pattern to extract fields #10 and #11
local start,stop,f10,f11=line:find(pat)
-- Add entry to our stats table, key is device name, value is f10+f11
stats[dev]=tonumber(f10)+tonumber(f11)
end
end
local max=-1 -- Start with impossibly low value for max
local maxdev=nil -- This will hold the name of the "max" device
num_idle=0 -- Number of drives with no I/O
num_drives=0 -- Total number of drives
-- Loop through key/value pairs for each drive
for dev,stat in pairs(stats) do
old_stat=old_stats[dev] -- Look up previous stat
num_drives=num_drives+1 -- Increment total number of drives
if old_stat then -- If we have a previous stat for this device
local diff=stat-old_stat -- See how much it has changed
-- Compare to our current "max" value and update as required
if diff > max then maxdev=dev; max=diff end
-- If this stat hasn't changed, increment idle drive count
if diff == 0 then num_idle=num_idle+1 end
end
end
if max>-1 then -- Always true, except for very first loop
io.write(clear) -- Clear previous output
io.write(fmt:format(num_idle,num_drives,maxdev,max)) -- Show stats
end
old_stats=stats -- Save this table to compare with next loop
if not os.execute('sleep 1') then os.exit() end -- zzzzz
until false -- loop