f3 icon indicating copy to clipboard operation
f3 copied to clipboard

Identifying drives that starve the rest for IO

Open unfa opened this issue 6 years ago • 2 comments
trafficstars

I'm using a custom Bash script to deploy f3 for massive drive testing (thousands per month).

One problem I have is some faulty drives will cause the rest to stall. I have finally found a way to identify (and remove) them to restore the peace in the Galaxy - so I thought I'll share my research:

https://unix.stackexchange.com/questions/484900/detecting-misbehaving-usb-flash-memory-devices-that-block-the-rests-io

unfa avatar Feb 14 '19 10:02 unfa

Thank you for posting your results here.

AltraMayor avatar Feb 14 '19 11:02 AltraMayor

You script is forking 27+ new processes per loop (every second), and doing I/0 on 5+ files per loop. That doesn't seem very efficient.

Here is a more efficient approach, written in Lua 5.3. It forks only two processes per loop, and doesn't create any files.

It will only show the total number of "idle" drives (devices with no active I/O), the total number of all drives, the device name of the "hog", and the stat value for that device.

Note that this is intended for Lua 5.3 (older versions might work, but Ctrl+C might not kill it properly)

#!/usr/bin/env lua

-- Edit the wildcard if you need a more specific match
local wildcard='sd*'

local old_stats={} -- saved stats from previous loop 

-- Capture pattern: Skips the first nine fields, captures fields #10 and #11
local pat=string.rep('%d+%s+',9)..'(%d+)%s+(%d+)'

-- String of spaces (to clear previous output before writing new data)
local clear=string.rep(' ',64)..'\r'

-- Output string format, prints the number of idle drives / total drives,
-- the device name of the hog, and its current status.
local fmt='  IDLE: %d/%d MAX:%s (%d)\r'

-- Begin infinite loop (until forever or Ctrl+C)
repeat
  local find=io.popen("find /sys/block/ -maxdepth 1 -name " ..wildcard)
  local stats={} -- Start with empty table
  for dir in find:lines() do
    dev=dir:gsub('^.*/','') -- Name of device,less path
    local stat=io.open(dir..'/stat') -- Open stat file
    for line in stat:lines() do
      -- Match pattern to extract fields #10 and #11
      local start,stop,f10,f11=line:find(pat)
      -- Add entry to our stats table, key is device name, value is f10+f11
      stats[dev]=tonumber(f10)+tonumber(f11)
    end
  end
  local max=-1 -- Start with impossibly low value for max
  local maxdev=nil -- This will hold the name of the "max" device
  num_idle=0 -- Number of drives with no I/O
  num_drives=0 -- Total number of drives
  -- Loop through key/value pairs for each drive
  for dev,stat in pairs(stats) do
    old_stat=old_stats[dev] -- Look up previous stat
    num_drives=num_drives+1 -- Increment total number of drives
    if old_stat then -- If we have a previous stat for this device
      local diff=stat-old_stat -- See how much it has changed
      -- Compare to our current "max" value and update as required
      if diff > max then maxdev=dev; max=diff  end 
      -- If this stat hasn't changed, increment idle drive count
      if diff == 0 then num_idle=num_idle+1 end
    end
  end
  if max>-1 then -- Always true, except for very first loop
    io.write(clear) -- Clear previous output
    io.write(fmt:format(num_idle,num_drives,maxdev,max)) -- Show stats
  end
  old_stats=stats -- Save this table to compare with next loop
  if not os.execute('sleep 1') then os.exit() end -- zzzzz
until false -- loop

yetanothergeek avatar Feb 14 '19 19:02 yetanothergeek