snapraid
snapraid copied to clipboard
Add debug output to understand calculation of failure probability (FP)
Please don't actually merge this. I just wanted to publish it here for others facing the same confusion, that I faced.
When running snapraid smart
on my machine I was surprised to see a failure probability of 84% even though all 5 SMART values mentioned in the SnapRAID FAQ suggested a perfectly healthy drive.
To understand where this high value came from I added some debug ouput along the code path for the calculation and saw that it was based on Load_Cycle_Count (193)
which was not mentioned in the FAQ, nor in the Backblaze blogposts.
With this patch the output looks like this:
SnapRAID SMART report:
Temp Power Error FP Size
C OnDays Count TB Serial Device Disk
-----------------------------------------------------------------------
41 569 0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=0||value/step=0||value=0||result=0.047450|calculated AFR for SMART value 187: 0.047450 (4.634185%)
|value=235466||value/step=362||value=255||result=1.822567|calculated AFR for SMART value 193: 1.822567 (83.838958%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
84% 2.0 Z4Z4JCWX /dev/sda disk1
32 2719 0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=5093425||value/step=7848||value=255||result=1.822567|calculated AFR for SMART value 193: 1.822567 (83.838958%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
84% 0.5 71UTC68RT /dev/sdd disk2
44 39 0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=0||value/step=0||value=0||result=0.047450|calculated AFR for SMART value 187: 0.047450 (4.634185%)
|value=162||value/step=0||value=0||result=0.000000|calculated AFR for SMART value 193: 0.000000 (0.000000%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
5% 6.0 ZF200GE8 /dev/sdb parity
- - - n/a - - /dev/sdh -
- - - n/a - - /dev/sdg -
- - - n/a - - /dev/sde -
- - - n/a - - /dev/sdf -
30 606 0 SSD 0.2 Y5IB61BCKNSX /dev/sdc -
31 36 0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
4% 1.0 S2RXJ9FCB07612 /dev/sdi -
The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.
Probability that at least one disk is going to fail in the next year is 98%.
To compile on ubuntu:
apt install build-essential autoconf
autoreconf -i
./configure
make
# run the binary:
./snapraid
Hi @saladpanda
If I read the output correctly, the Load Cycle Count of your disk is 235,466, which is indeed a high value. In the data I analyzed, this appears to be an indicator of potential failure. However, it's important to note that this doesn't necessarily mean your disk will fail, as each case is unique. It's a good idea to check your hard drive's specifications to see what it's rated for, just to be sure. Hard drives are typically rated for around 600,000 cycles.
See for example this discussion:
https://superuser.com/questions/840851/how-much-load-cycle-count-can-my-hard-drive-hypotethically-sustain