hw-probe icon indicating copy to clipboard operation
hw-probe copied to clipboard

S.M.A.R.T. errors should not be reported as malfunc for all HDD

Open axet opened this issue 4 years ago • 5 comments

Report https://linux-hardware.org/?probe=59b41381ed showing possible hdd issues, those issues related to automatically marked SMART issues and becoming malfunc for all HDD, it should not be so.

  • https://linux-hardware.org/index.php?probe=38e0ad48b1
  • https://linux-hardware.org/index.php?probe=38e0ad48b1&log=smartctl#sda

axet avatar Dec 30 '20 13:12 axet

Why?

The status of your particular drive is green. The warning tells us that other users experience some problems with the same drive model.

linuxhw avatar Dec 30 '20 15:12 linuxhw

I'd like to get notices only about issues with drivers or design hardware issues (bugs, crashes, hangs, stability, quality). SMART issues only important when a lot of devices with specific model (model + firmware) has issues known to be broken.

For example if all devices with specific model get bad blocks within 1 year, I'd like to know it. But when 1 device has SMART report issues, I rather do not have any related information until issue get above average failure rate.

axet avatar Dec 30 '20 21:12 axet

Maybe relevant to this issue … I do like to see malfunctions flagged at (for example):

  • https://bsd-hardware.info/?probe=1f3fa432dc (2020-11-21)
  • https://bsd-hardware.info/?probe=60d9540d35 (2020-12-31)
  • https://bsd-hardware.info/?probe=c2e361eeff (2021-01-23)

Amber is fine, although the text could be better. In lieu of:

The S.M.A.R.T. errors are detected. It's recommended to replace the drive soon.

– consider:

Red alert for at least one S.M.A.R.T. attribute.

– then it's for the end user to interpret each error or alert in the context of:

  • https://www.smartmontools.org/wiki/FAQ

Side notes for the example above:

  • the significant increase of Reallocated_Sector_Ct from 168 (0 21) to 208 (0 25) (2020-12-31) was probably the result of extraordinary thrashing + consumption of free space on 2020-12-29 – https://github.com/ncw/stressdisk/issues/14
  • the hardware is to be written off after I gain a new notebook.

grahamperrin avatar Jan 23 '21 13:01 grahamperrin

https://github.com/linuxhw/hw-probe/issues/82#issuecomment-752769666

… when a lot of devices with specific model (model + firmware) has issues known to be broken. …

Currently at https://bsd-hardware.info/?probe=6149ab50a8#d5:

image

This device model is known to have problems

Following the link to https://bsd-hardware.info/?id=ide:hgst-hts725050a7e630 it seems that three of the six malfunc listings are for a single computer:

image

If the "known to have problems" alert is based on just four disks: the sample size is probably too small.

grahamperrin avatar Jul 24 '21 15:07 grahamperrin

Yes, we need to show some reliability metrics for drives instead of a warning. From linuxhw/SMART

linuxhw avatar May 13 '23 06:05 linuxhw