pm2-server-monit icon indicating copy to clipboard operation
pm2-server-monit copied to clipboard

Detect disk drive health (SSD corruption, bad blocks, errors)

Open Unitech opened this issue 8 years ago • 3 comments
trafficstars

This feature would help to detect hard drive becoming unstable and that begin to requires a potential replacement.

(included in unaccessible server post mortem)

Unitech avatar Feb 05 '17 00:02 Unitech

Can use the smartctl tool, in package smartmontools. Return the S.M.A.R.T. of the HDD, like

smartctl -d ata -H /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   068   035   045    Old_age   Always   In_the_past 32 (2 27 32 31 0)

f-hj avatar Feb 05 '17 10:02 f-hj

Nice! But it looks like we need the root rights :( is there any other way?

Unitech avatar Feb 06 '17 09:02 Unitech

There is also an indicator to get the overall health of a SSD:

>>> sudo smartctl -s on -a /dev/sda | grep Media
233 Media_Wearout_Indicator 0x0013   098   098   000    Pre-fail  Always       -       16584277

The 098 is an indicator on 100 about the SSD health

Unitech avatar Feb 06 '17 09:02 Unitech