FreeNAS-Report icon indicating copy to clipboard operation
FreeNAS-Report copied to clipboard

Harddrive missed and unknown message

Open argawow opened this issue 4 years ago • 22 comments

Hi, first, I am using this script a long time ago for now. Thanks for it :)

Since two days one of my harddrives are missing in the summary and I get a unknown message

Image with the missing drive: image

Image without the missing drive. image

Errormessage: awk: newline in string 37267 newer... at source line 1 awk: newline in string Extended 19... at source line 1

Freenas dont have any errors about this drive at the moment. Poos is not degraded.

Thanks for help :)

argawow avatar Feb 11 '21 05:02 argawow

Does ada1 show up when you run sysctl -n kern.disks (as root)?

edgarsuit avatar Feb 11 '21 14:02 edgarsuit

Hi, here is the output of the command:

root@freenas:~ # sysctl -n kern.disks da0 ada7 ada6 ada5 ada4 ada3 ada2 ada1 ada0 cd0

argawow avatar Feb 11 '21 14:02 argawow

Gotcha, it shows up there, so that's good. The script then checks smartctl to see if SMART is enabled. Run smartctl -i /dev/ada1 and paste the output here.

edgarsuit avatar Feb 11 '21 14:02 edgarsuit

Sorry, is it /dev/ada1 that's missing? Or /dev/ada2?

edgarsuit avatar Feb 11 '21 14:02 edgarsuit

/dev/ada2 is the missing one :)

here is the output of the command smartctl -i /dev/ada2

=== START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD80EFZX-68UW8N0 Serial Number: xxxxxx LU WWN Device Id: 5 000cca 254f61940 Firmware Version: 83.H0A83 User Capacity: 8,001,563,222,016 bytes [8.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Feb 12 12:23:50 2021 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled

Andreas

argawow avatar Feb 12 '21 11:02 argawow

Sorry, I missed the notification from your reply and this fell off my radar. I'm kind of at a loss for why this disk is getting excluded.

Try copying this whole block of code into your terminal and see what it spits out:

drives=$(for drive in $(sysctl -n kern.disks); do
    if [ "$(smartctl -i /dev/"${drive}" | grep "SMART support is: Enabled")" ] && ! [ "$(smartctl -i /dev/"${drive}" | grep "Solid State Device")" ]; then
        printf "%s " "${drive}"
    fi
done | awk '{for (i=NF; i!=0 ; i--) print $i }')
echo $drives

This is the code the script uses to figure out which drives should be included in the report. It looks at the smartctl output to check if SMART is enabled and to see if it's an SSD.

edgarsuit avatar Mar 22 '21 18:03 edgarsuit

Ive been having the same issue. It started when one of my drives started getting Current Pending Sectors errors. The error i got was awk: newline in string 9648 newer... at source line 1 awk: newline in string Extended 17... at source line 1

and its excluded in the smart summary table that the script outputs. However smartctl and freenas can see the drive and the pool and disks all show up fine with the above commands and in the GUI

ekaley avatar Mar 28 '21 17:03 ekaley

I also tried https://github.com/Spearfoot/FreeNAS-scripts this smart report script and get the "same awk error". The interesting part with either this smart script or the one i linked is the script outputs is has the drive only excluded from the smart summary table not the details.

ekaley avatar Mar 28 '21 17:03 ekaley

image

ekaley avatar Mar 28 '21 17:03 ekaley

image

ekaley avatar Mar 28 '21 17:03 ekaley

@edgarsuit

ran into this today as well. issue comes from these two lines https://github.com/edgarsuit/FreeNAS-Report/blob/cfc6bcb0abae17e47ac5d67d25b2597f01f0b5ab/report.sh#L311 https://github.com/edgarsuit/FreeNAS-Report/blob/cfc6bcb0abae17e47ac5d67d25b2597f01f0b5ab/report.sh#L312

when a drive starts to have an error this command smartctl -l selftest /dev/"$drive | grep "# 1" will give something like this

# 1  Extended offline    Completed without error       00%     32284         -
12 of 12 failed self-tests are outdated by newer successful extended offline self-test # 1

so awk '{print $9}' will print

32284
newer

and awk '{print $3}'

Extended
12

Here's a sample output of smartctl -l selftest /dev/"$drive" for your reference

smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     32284         -
# 2  Short offline       Completed without error       00%     32275         -
# 3  Extended offline    Completed: read failure       10%     32125         3519067072
# 4  Conveyance offline  Completed without error       00%     32093         -
# 5  Short offline       Completed: read failure       60%     32069         3519067072
# 6  Extended offline    Completed: read failure       10%     31957         3519067072
# 7  Conveyance offline  Completed without error       00%     31925         -
# 8  Short offline       Completed: read failure       60%     31901         3519067072
# 9  Extended offline    Completed: read failure       10%     31790         3519067072
#10  Conveyance offline  Completed without error       00%     31758         -
#11  Short offline       Completed: read failure       70%     31734         3519067072
#12  Extended offline    Completed: read failure       10%     31625         3519067072
#13  Conveyance offline  Completed without error       00%     31590         -
#14  Short offline       Completed: read failure       60%     31566         3519067072
#15  Extended offline    Interrupted (host reset)      10%     31458         -
#16  Conveyance offline  Completed without error       00%     31422         -
#17  Short offline       Completed: read failure       10%     31400         3519067072
#18  Extended offline    Completed: read failure       10%     31289         3519067072
#19  Conveyance offline  Completed without error       00%     31255         -
#20  Short offline       Completed: read failure       60%     31231         3519067072
#21  Extended offline    Completed: read failure       10%     31120         3519067072
12 of 12 failed self-tests are outdated by newer successful extended offline self-test # 1

Markvis avatar Apr 03 '21 03:04 Markvis

@Markvis could you let me know if my refactor fixes your issue?

dak180 avatar May 16 '21 18:05 dak180

I also face the same issue with 3 drives being skipped. If I've got it - I should move over to the refactor?

jamesstanw avatar Jul 26 '21 02:07 jamesstanw

@jamesstanw try it and let me know if it works.

dak180 avatar Jul 26 '21 06:07 dak180

sdsds

@jamesstanw try it and let me know if it works.

I've called the script with: /bin/sh ./service.sh and it reports: root@freenas:/mnt/NAS2/NAS2_data/james/scripts # /bin/sh ./service.sh ./service.sh: 73: Syntax error: "(" unexpected Removing the brackets at line 73 -- sorry I'm poking in the dark here -- shows this: ./service.sh: function: not found ./service.sh: cannot create : No such file or directory Please edit the config file for your setup

jamesstanw avatar Jul 30 '21 03:07 jamesstanw

I've called the script with: /bin/sh ./service.sh

@jamesstanw that will not work; just use ./report.sh -c /path/where/you/want/the/config/file since it requires bash (sh is not bash) and the shebang line in the script will take care of that for you.

dak180 avatar Jul 30 '21 03:07 dak180

I've called the script with: /bin/sh ./service.sh

@jamesstanw that will not work; just use ./report.sh -c /path/where/you/want/the/config/file since it requires bash (sh is not bash) and the shebang line in the script will take care of that for you.

Sorry a bit of naivete here . . . '-c /path/where/you/want/the/config/file' Where should I want the config to be? :)

I've copied your script, marked it executable and set it (through the GUI) to run as a cron job. I want to manually run to make sure all is well.

jamesstanw avatar Jul 30 '21 03:07 jamesstanw

Where should I want the config to be? :)

Wherever you like. ☺

I've copied your script, marked it executable and set it (through the GUI) to run as a cron job. I want to manually run to make sure all is well.

You should run it manually first; on the first run it will create the config file which you will need to edit before the script will run correctly.

dak180 avatar Jul 30 '21 04:07 dak180

Thanks! I've run the script and get a partial report (pool status but not smart testing results) - that is almost instantaneous. It is throwing the following error, though: parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 parse error: Invalid numeric literal at line 1, column 9 I am running Freenas 11.2 (not truenas). Have I missed a setting somewhere? Thanks!

jamesstanw avatar Aug 07 '21 16:08 jamesstanw

I am running Freenas 11.2 (not truenas). Have I missed a setting somewhere? Thanks!

My version of the script has never been tested on 11 only 12; since I do not have a system running 11 I do not think that I would be able to make the script work there. I would encourage you to move to 12 anyway though.

dak180 avatar Aug 08 '21 03:08 dak180

I've just looked at update adn the only version of 12 I've got access to is the development version for testing. I'm really tied to the FreeNas release train. Frustrating . . . since the script was running great but started to overlook the four oldest disks. It was great to have this automated (big thanks for all the work). Is there some significant difference in the way TrueNas handles the disks over Freenas?

jamesstanw avatar Aug 09 '21 03:08 jamesstanw

Is there some significant difference in the way TrueNas handles the disks over Freenas?

No, not the disks; I would suggest reading the release notes starting 12.0 though U5 before you update though.

dak180 avatar Aug 09 '21 15:08 dak180