SAS drive support
When I attempt to run the script I get the following message "ERROR [Main] AssertionError" then the script exits. I have been able to reproduce this using the Fedora distribution package, PyPI and source installation methods. For now I am just testing with /dev/sda but plan to have hddfancontol monitor all disks when I am able to get this functional.
OS: Fedora 36 (5.17.5-300.fc36.x86_64) Package version: hddfancontrol-1.5.0-2.fc36.noarch
Hardware info: Systemboard: MSI MAG X570 TOMAHAWK WIFI Fans: 3x Noctua NF-F12 IPPC-3000 PWM (connected to single systemboard header via pwm fan hub) Disks: 18x 18TB Seagate Exos X18 SAS 12Gb/s (ST18000NM004J) Controller: LSI 9305-24i SAS HBA Chassis: Supermicro SC846 with passthough SAS backplane (non-expander)
Attempting to start hddfancontrol using a single disk:
# hddfancontrol -d /dev/sda -p /sys/class/hwmon/hwmon4/pwm3 --pwm-start-value 170 --pwm-stop-value 60 --min-fan-speed-prct 20 -i 30 -v debug
2022-06-18 16:42:34,368 INFO [Main] Process real time scheduler set to 2, priority 49
2022-06-18 16:42:34,447 ERROR [Main] AssertionError:
2022-06-18 16:42:34,447 INFO [Fan #1] Setting fan speed to 100%
2022-06-18 16:42:34,449 DEBUG [Fan #1] Setting PWM value to 255
Attempting to start with --smartctl switch:
# hddfancontrol -d /dev/sda -p /sys/class/hwmon/hwmon4/pwm3 --pwm-start-value 170 --pwm-stop-value 60 --min-fan-speed-prct 20 -i 30 -v debug --smartctl
2022-06-18 19:43:15,763 INFO [Main] Process real time scheduler set to 2, priority 49
2022-06-18 19:43:15,871 ERROR [Main] AssertionError:
2022-06-18 19:43:15,871 INFO [Fan #1] Setting fan speed to 100%
2022-06-18 19:43:15,874 DEBUG [Fan #1] Setting PWM value to 255
Attempting to start hddfancontrol without pwm start and stop values to attempt to use the test feature:
# hddfancontrol -d /dev/sda -p /sys/class/hwmon/hwmon4/pwm3 -v debug
2022-06-18 16:43:07,546 WARNING [Startup] Missing --pwm-start-value or --pwm-stop-value argument, running hardware test to find values
Traceback (most recent call last):
File "/usr/bin/hddfancontrol", line 33, in <module>
sys.exit(load_entry_point('hddfancontrol==1.5.0', 'console_scripts', 'hddfancontrol')())
File "/usr/local/lib/python3.10/site-packages/hddfancontrol-1.5.0-py3.10.egg/hddfancontrol/__init__.py", line 1293, in cl_main
File "/usr/local/lib/python3.10/site-packages/hddfancontrol-1.5.0-py3.10.egg/hddfancontrol/__init__.py", line 917, in test
File "/usr/local/lib/python3.10/site-packages/hddfancontrol-1.5.0-py3.10.egg/hddfancontrol/__init__.py", line 917, in <listcomp>
File "/usr/local/lib/python3.10/site-packages/hddfancontrol-1.5.0-py3.10.egg/hddfancontrol/__init__.py", line 143, in __init__
File "/usr/local/lib/python3.10/site-packages/hddfancontrol-1.5.0-py3.10.egg/hddfancontrol/__init__.py", line 187, in getPrettyName
AssertionError
Testing manual fan control:
# cat /sys/class/hwmon/hwmon4/pwm3_enable
0
# echo 1 >> /sys/class/hwmon/hwmon4/pwm3_enable
# cat /sys/class/hwmon/hwmon4/pwm3_enable
1
Manually set fan to max speed and confirm:
# echo 255 > /sys/class/hwmon/hwmon4/pwm3
# sensors | grep fan3
fan3: 2824 RPM (min = 0 RPM)
Manually set fan pwm value to 100 and confirm slowdown:
# echo 100 > /sys/class/hwmon/hwmon4/pwm3
# sensors | grep fan3
fan3: 1442 RPM (min = 0 RPM)
Query hddtemp with disk currently being monitored:
# hddtemp -u C -n /dev/sda
35
Query hddtemp with all disks:
# hddtemp -u C
/dev/sda: SEAGATE ST18000NM004J: 35°C
/dev/sdb: SEAGATE ST18000NM004J: 38°C
/dev/sdc: SEAGATE ST18000NM004J: 38°C
/dev/sdd: SEAGATE ST18000NM004J: 36°C
/dev/sde: SEAGATE ST18000NM004J: 37°C
/dev/sdf: SEAGATE ST18000NM004J: 38°C
/dev/sdg: SEAGATE ST18000NM004J: 40°C
/dev/sdh: SEAGATE ST18000NM004J: 37°C
/dev/sdi: SEAGATE ST18000NM004J: 36°C
/dev/sdj: SEAGATE ST18000NM004J: 39°C
/dev/sdk: SEAGATE ST18000NM004J: 36°C
/dev/sdl: SEAGATE ST18000NM004J: 39°C
/dev/sdm: SEAGATE ST18000NM004J: 36°C
/dev/sdn: SEAGATE ST18000NM004J: 39°C
/dev/sdo: SEAGATE ST18000NM004J: 39°C
/dev/sdp: SEAGATE ST18000NM004J: 41°C
/dev/sdq: SEAGATE ST18000NM004J: 40°C
/dev/sdr: SEAGATE ST18000NM004J: 38°C
Smartctl temp query:
# smartctl -a /dev/sda | grep -i temp
Temperature Warning: Enabled
Current Drive Temperature: 35 C
Drive Trip Temperature: 60 C
From reading some previous issue tickets I noticed you asked for the following output to query the temperature using hdparm. However, since these are SAS disks I believe they should be using sdparm which doesnt appear to have temperature support. Not sure if this matters since hddtemp and smartctl are able to get the drive temps.
# hdparm -H /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 72 05 20 00 00 00 00 1c 02 06 00 00 cf 00 00 00 03 02 00 01 80 0e 00 00 00 00 00 00 00 00 00 00
HDIO_DRIVE_CMD(hitachisensecondition) failed: Input/output error
# hdparm -C /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 72 05 20 00 00 00 00 1c 02 06 00 00 cf 00 00 00 03 02 00 01 80 0e 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]: 72 05 20 00 00 00 00 1c 02 06 00 00 cf 00 00 00 03 02 00 01 80 0e 00 00 00 00 00 00 00 00 00 00
drive state is: unknown
I have also tried loading the drivetemp kernel module and this did not make any difference.
Disks: 18x 18TB Seagate Exos X18 SAS 12Gb/s (ST18000NM004J)
hddfancontrol has only been tested with SATA drives.
What is the output of hdparm -I /dev/sda?
# hdparm -I /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 72 05 20 00 00 00 00 1c 02 06 00 00 cf 00 00 00 03 02 00 01 80 0e 00 00 00 00 00 00 00 00 00 00
So it looks like hddfancontrol is not able to query the information it needs using hdparm when using SAS drives. I have not had a chance to look through the code yet to see how difficult it would be to bypass hdparm using an alternate method when SAS drives are detected.
I don't know either, simply because I don't have any SAS drive to test. That is why the requirements states you need at least one SATA drive.
The assertion error exception comes from a part that tries to find a user friendly name to display for the drive, but they are many other parts of the code that may break for SAS drives:
- runtime power state reading
- activity/idle detection
- temperature probing (do hddtemp/smartctl/drivetemp work for SAS?)
I'm pretty weak in python but from looking though the code a bit it looks like any of the queries that use hdparm will likely cause a problem with SAS which would include getPrettyName. I can add a sata drive into the chassis when I am back home in a few days but for now I just have remote access to test things.
As far as I can tell SAS drives do not support idle spindown by design so they should always be active.
I am able to do the temperature probing through hddtemp and smartctl. I have not really looked into how the drivetemp module works yet to see if that is functional.
It is worth noting that the command # smartctl -l scttempsts /dev/sda does not work with SAS. scttempsts is noted as an ATA only switch in the manpages for smartctl.
SAS hddtemp
# hddtemp -u C -n /dev/sda
33
# hddtemp
/dev/sda: SEAGATE ST18000NM004J: 33°C
/dev/sdb: SEAGATE ST18000NM004J: 38°C
/dev/sdc: SEAGATE ST18000NM004J: 38°C
/dev/sdd: SEAGATE ST18000NM004J: 35°C
/dev/sde: SEAGATE ST18000NM004J: 36°C
/dev/sdf: SEAGATE ST18000NM004J: 37°C
/dev/sdg: SEAGATE ST18000NM004J: 40°C
/dev/sdh: SEAGATE ST18000NM004J: 35°C
/dev/sdi: SEAGATE ST18000NM004J: 35°C
/dev/sdj: SEAGATE ST18000NM004J: 38°C
/dev/sdk: SEAGATE ST18000NM004J: 35°C
/dev/sdl: SEAGATE ST18000NM004J: 38°C
/dev/sdm: SEAGATE ST18000NM004J: 34°C
/dev/sdn: SEAGATE ST18000NM004J: 39°C
/dev/sdo: SEAGATE ST18000NM004J: 39°C
/dev/sdp: SEAGATE ST18000NM004J: 40°C
/dev/sdq: SEAGATE ST18000NM004J: 40°C
/dev/sdr: SEAGATE ST18000NM004J: 37°C
SAS smartctl (grep for temp)
# smartctl -a /dev/sda | grep -i temp
Temperature Warning: Enabled
Current Drive Temperature: 34 C
Drive Trip Temperature: 60 C
SAS smartctl (full output)
# smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.17.5-300.fc36.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST18000NM004J
Revision: E002
Compliance: SPC-5
User Capacity: 18,000,207,937,536 bytes [18.0 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c500d7c5bf7f
Serial number: [REDACTED]
Device type: disk
Transport protocol: SAS (SPL-4)
Local Time is: Tue Jun 21 03:29:02 2022 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature: 34 C
Drive Trip Temperature: 60 C
Accumulated power on time, hours:minutes 102:12
Manufactured in week 35 of year 2021
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 6
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 94
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 814853864
Blocks received from initiator = 1411948984
Blocks read from cache and sent to initiator = 1869920
Number of read and write commands whose size <= segment size = 14593
Number of read and write commands whose size > segment size = 1772
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 102.20
number of minutes until next internal SMART test = 3
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 18009.391 0
write: 0 0 0 0 0 31509.345 0
Non-medium error count: 0
Pending defect count:0 Pending Defects
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged
@barichardson can you try the sas branch?
I have added a fallback to get the model name with smartctl. This will likely fail further when reading power state, but it don't have much time to look at it right now.
Looks like it is getting a bit farther now. Like you predicted it is failing on reading power state.
# /usr/local/bin/hddfancontrol -d /dev/sda -p /sys/class/hwmon/hwmon4/pwm3 --pwm-start-value 170 --pwm-stop-value 60 --min-temp 30 --max-temp 45 --min-fan-speed-prct 10 -i 30 -v debug
2022-06-21 15:27:12,298 INFO [Main] Process real time scheduler set to 2, priority 49
2022-06-21 15:27:14,057 INFO [sda ST18000NM004J] Drive does not support native drivetemp temp query
2022-06-21 15:27:14,108 WARNING [sda ST18000NM004J] Drive does not support HGST temp query
2022-06-21 15:27:14,108 INFO [sda ST18000NM004J] Will probe temperature with method HDDTEMP_INVOCATION
2022-06-21 15:27:14,201 ERROR [Main] CalledProcessError: Command '('hdparm', '-C', '/dev/sda')' returned non-zero exit status 5.
2022-06-21 15:27:14,201 INFO [Fan #1] Setting fan speed to 100%
2022-06-21 15:27:14,203 WARNING [Fan #1] /sys/class/hwmon/hwmon4/pwm3_enable was 0, setting it to 1
2022-06-21 15:27:14,203 DEBUG [Fan #1] Setting PWM value to 255
# hdparm -C /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 72 05 20 00 00 00 00 1c 02 06 00 00 cf 00 00 00 03 02 00 01 80 0e 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]: 72 05 20 00 00 00 00 1c 02 06 00 00 cf 00 00 00 03 02 00 01 80 0e 00 00 00 00 00 00 00 00 00 00
Please test the sas branch again on the last commit https://github.com/desbma/hddfancontrol/commit/317de49fa8cfd0a8363aebcd5a49cc80fd460f3a
Wow thanks, looks like it is working now.
# /usr/local/bin/hddfancontrol -d /dev/sda -p /sys/class/hwmon/hwmon4/pwm3 --pwm-start-value 170 --pwm-stop-value 60 --min-temp 30 --max-temp 45 --min-fan-speed-prct 10 -i 30 -v debug
2022-06-21 16:56:00,696 INFO [Main] Process real time scheduler set to 2, priority 49
2022-06-21 16:56:02,464 INFO [sda ST18000NM004J] Drive does not support native drivetemp temp query
2022-06-21 16:56:02,517 WARNING [sda ST18000NM004J] Drive does not support HGST temp query
2022-06-21 16:56:02,517 INFO [sda ST18000NM004J] Will probe temperature with method HDDTEMP_INVOCATION
2022-06-21 16:56:02,609 DEBUG [sda ST18000NM004J] Drive state: UNKNOWN
2022-06-21 16:56:02,729 DEBUG [sda ST18000NM004J] Drive temperature: 33 °C
2022-06-21 16:56:02,729 INFO [Main] Maximum device temperature: 33 °C
2022-06-21 16:56:02,729 INFO [Fan #1] Setting fan speed to 20%
2022-06-21 16:56:02,731 DEBUG [Fan #1] Rotation speed is currently 2830 RPM
2022-06-21 16:56:02,731 WARNING [Fan #1] /sys/class/hwmon/hwmon4/pwm3_enable was 0, setting it to 1
2022-06-21 16:56:02,731 DEBUG [Fan #1] Setting PWM value to 99
2022-06-21 16:56:02,731 DEBUG [Main] Sleeping for 20 seconds
2022-06-21 16:56:22,735 DEBUG [sda ST18000NM004J] Drive state: UNKNOWN
2022-06-21 16:56:22,863 DEBUG [sda ST18000NM004J] Drive temperature: 33 °C
2022-06-21 16:56:22,864 INFO [Main] Maximum device temperature: 33 °C
2022-06-21 16:56:22,864 DEBUG [Main] Sleeping for 20 seconds
This is now released in 1.6.0.