smartctl_exporter
                                
                                 smartctl_exporter copied to clipboard
                                
                                    smartctl_exporter copied to clipboard
                            
                            
                            
                        Metrics for SMART logs are no longer collected
The --xall parameter was removed in https://github.com/prometheus-community/smartctl_exporter/commit/e88442081c18fcac9025e4d806786e12c791cc10, but no --log parameter was added. Hence, the smartctl_device_num_err_log_entries, smartctl_device_error_log_count, smartctl_device_self_test_log_count, and smartctl_device_self_test_log_error_count metrics stay empty as the smartctl does not report the relevant data to the exporter.
Adding --log=error --log=selftest Seems to fix this, tested it on master, but i don't know if this wakes up drives, it shouldn't according to doc.
If someone wants to test is, execute this on a device that is in sleep status:
smartctl --info --health --attributes --tolerance=verypermissive --nocheck=standby --format=brief --log=error --log=selftest <device>
--format=brief is quite redundant with --json
Yeah, when we have a drive that fails a self test, it seems that without --log=selftest, there's no way for us to know that an otherwise fine drive has had any problem.
At the very least, in cases like this, it would be nice to get a smartctl.exit_status of something other than zero.
With --log=selftest included, we get an exit_status of 128.
Even with --nocheck=never, on a sample drive that's loaded to 100% IO, smartctl returns different output with & without the --xall command..
We need to bring back the --xall to get correct information to get the fields populated.
Here's the JSON with and without --xall using --nocheck=never. (changing nocheck in this case doesn't have an effect , this drive is never idle due to it's workload). Diff included for ease of review.
smartctl-_dev_sdb-info.health.attributes.tolerance_verypermissive.nocheck_never.format_brief.log_error.xall.json smartctl-_dev_sdb-info.health.attributes.tolerance_verypermissive.nocheck_never.format_brief.log_error.json
smartctl-xall-json.patch.gz Sorry, GitHub would not let me attach the patch unless I compressed it.
@SuperQ @NiceGuyIT should we consider this a smartctl bug or a tradeoff we have to ask users to make?
If users want these metrics, they have to consider that the metrics might wake a drive and prevent idle.
@robbat2 I hope to do a deep dive into this and a few other issues before or around the holidays. This issue might be related to #152 which was caused by PR #131 that introduced --log=error. Since you added the Python script to save a redacted version of smartctl, I was going to modify that to compare the difference between the smartctl switches so that we can make a logical step forward. I'd rather not play wack-a-mole with the smartctl switches. If there's a tradeoff, it can be documented and left up to the user, while at the same time reported upstream to see what Smartmontools thinks.
OK, so I was directed here from #190. Seeing as this issue is 1.5 years old, what is the verdict here? As it stands, smartctl_exporter as a project is more or less useless because it fails to collect most of the actually interesting metrics.
@NiceGuyIT  did you make any progress on it? On all of the drives I tried, it seems there's less data without --xall.
I think we should introduce an exporter option like --wake-drives-for-more-data that enables  the --xall option to smartctl, and then the output will be fine. Just document it as a potentially waking drives (most of the fleet I care about is never idle anyway).