aqc111 icon indicating copy to clipboard operation
aqc111 copied to clipboard

DS918+ with Sabrent NT-SS5G - Random Crash, DSM unresponsive, Unsafe Shutdown

Open dedura opened this issue 3 years ago • 38 comments

Description of the problem

Hi, Since mid of December, I am experiencing random crashes and disconnects on my DS918+ with my Sabrent NT-SS5G Adapter, using the latest driver (v. 1.3.3.0-10). The DSM itself becomes totally unresponsive, wouldn't allow me to stop/restart the driver in Package Center and after 1-2 minutes, suddenly crashes/restarts the whole NAS. The NAS informs that the system was shut down unsafely and starts Data Scrubbing once booted. This happens every 3-4 Days. Tried both the rear and front USB ports of the NAS, but the issue remained.

Description of your products

NAS: Synology DS918+ DSM: 7.1.1-42962 Update 3 Adapter: SABRENT NT-SS5G Driver: 1.3.3.0-10 DSM-7.x (reuploaded) RAM: 16GB Other USB Port used for: (UPS) CP1500EPFCLCD - Cyber Power System, Inc.

Description of your environment

Connection: From "DS918+" to PC's NIC "Marvell® AQtion AQC107 10Gb Ethernet" PC Motherboard: ASUS ROG MAXIMUS XII FORMULA Z490 PC OS: Windows 11 Pro 22H2 Ethernet Driver version: 3.1.7.0 Cable: VENTION 1m CAT 8 Ethernet Cable Connection used for: SMB, WinNUT-2.0 (UPS)

The adapter was working fine before December without any issues, could this be caused after the latest DSM Update 3? Hope you could help to fix this. Thank you!

dedura avatar Jan 11 '23 09:01 dedura

Do you have any other USB devices connected, and what are the results of lsusb -a?

bb-qq avatar Jan 28 '23 06:01 bb-qq

Hi, I am now using my previous 2.5G CLUB 3D CAC-1420 Adapter with the driver "r8152, 2.16.3-3 DSM7.x (reuploaded)", which works fine without any issues.

Only the Ethernet Adapter and the UPS are connected, nothing else. Please see below the output of lsusb:

|__usb1 1d6b:0002:0404 09 2.00 480MBit/s 0mA 1IF (Linux 4.4.180+ xhc i-hcd xHCI Host Controller 0000:00:15.0) hub |__1-3 0764:0501:0001 00 2.00 12MBit/s 2mA 1IF (CPS CP1500EPFCLCD CRXLW2000395) |__1-4 f400:f400:0100 00 2.00 480MBit/s 200mA 1IF (Synology DiskSta tion 7F008AFA20E41640) |__usb2 1d6b:0003:0404 09 3.00 5000MBit/s 0mA 1IF (Linux 4.4.180+ xhc i-hcd xHCI Host Controller 0000:00:15.0) hub |__2-2 0bda:8156:3000 00 3.20 5000MBit/s 512mA 1IF (Realtek USB 10/1 00/1G/2.5G LAN 000000001)

dedura avatar Jan 28 '23 22:01 dedura

Hmmm, from the symptoms it looks like a problem with the NT-SS5G, you might want to connect it to your PC to see if there are any stability issues.

Or you could try the QNA-UC5G1T if you can return NT-SS5G. I am also using a DS918+ and this device is running stable.

bb-qq avatar Jan 29 '23 11:01 bb-qq

Thank you, I followed your advice and ordered the QNA-UC5G1T. Will provide feedback in the next couple of days after testing.

dedura avatar Jan 29 '23 15:01 dedura

So, I have returned the NT-SS5G and got the QNA-UC5G1T. It's running fine now for 24 hours without crashing. I will monitor this for at least a week and update you again. I have noticed that my max speed is 355-360 MB/s (SMB). If you are using a Windows PC, could you share the Network Adapter settings of your NIC in device manager? I could possibly tweak a little to get the full speed.

dedura avatar Feb 02 '23 10:02 dedura

Providing iperf3 output: (Only getting a max of 355-360 MB/s (SMB) as mentioned above) OS: Windows 11 Pro 22H2

iperf3 -c 192.168.xx.xx -P 2 Connecting to host 192.168.xx.xx, port 5201 [ 4] local 192.168.yy.yy port 61286 connected to 192.168.xx.xx port 5201 [ 6] local 192.168.yy.yy port 61287 connected to 192.168.xx.xx port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 186 MBytes 1.56 Gbits/sec [ 6] 0.00-1.00 sec 186 MBytes 1.56 Gbits/sec [SUM] 0.00-1.00 sec 372 MBytes 3.12 Gbits/sec


[ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 6] 1.00-2.00 sec 200 MBytes 1.68 Gbits/sec [SUM] 1.00-2.00 sec 401 MBytes 3.37 Gbits/sec


[ 4] 2.00-3.00 sec 194 MBytes 1.63 Gbits/sec [ 6] 2.00-3.00 sec 190 MBytes 1.60 Gbits/sec [SUM] 2.00-3.00 sec 384 MBytes 3.22 Gbits/sec


[ 4] 3.00-4.00 sec 208 MBytes 1.74 Gbits/sec [ 6] 3.00-4.00 sec 206 MBytes 1.73 Gbits/sec [SUM] 3.00-4.00 sec 414 MBytes 3.47 Gbits/sec


[ 4] 4.00-5.00 sec 171 MBytes 1.43 Gbits/sec [ 6] 4.00-5.00 sec 170 MBytes 1.43 Gbits/sec [SUM] 4.00-5.00 sec 340 MBytes 2.86 Gbits/sec


[ 4] 5.00-6.00 sec 205 MBytes 1.72 Gbits/sec [ 6] 5.00-6.00 sec 204 MBytes 1.71 Gbits/sec [SUM] 5.00-6.00 sec 409 MBytes 3.43 Gbits/sec


[ 4] 6.00-7.00 sec 195 MBytes 1.64 Gbits/sec [ 6] 6.00-7.00 sec 194 MBytes 1.63 Gbits/sec [SUM] 6.00-7.00 sec 389 MBytes 3.27 Gbits/sec


[ 4] 7.00-8.00 sec 203 MBytes 1.70 Gbits/sec [ 6] 7.00-8.00 sec 202 MBytes 1.70 Gbits/sec [SUM] 7.00-8.00 sec 406 MBytes 3.40 Gbits/sec


[ 4] 8.00-9.00 sec 194 MBytes 1.62 Gbits/sec [ 6] 8.00-9.00 sec 192 MBytes 1.61 Gbits/sec [SUM] 8.00-9.00 sec 386 MBytes 3.23 Gbits/sec


[ 4] 9.00-10.00 sec 209 MBytes 1.75 Gbits/sec [ 6] 9.00-10.00 sec 208 MBytes 1.75 Gbits/sec [SUM] 9.00-10.00 sec 417 MBytes 3.50 Gbits/sec


[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 1.92 GBytes 1.65 Gbits/sec sender [ 4] 0.00-10.00 sec 1.92 GBytes 1.65 Gbits/sec receiver [ 6] 0.00-10.00 sec 1.91 GBytes 1.64 Gbits/sec sender [ 6] 0.00-10.00 sec 1.91 GBytes 1.64 Gbits/sec receiver [SUM] 0.00-10.00 sec 3.83 GBytes 3.29 Gbits/sec sender [SUM] 0.00-10.00 sec 3.83 GBytes 3.29 Gbits/sec receiver

dedura avatar Feb 02 '23 11:02 dedura

Update: Since my last post, it has disconnected 4 times, I had to manually stop the driver and start again. The good news: It didn't freeze, crash or restart my NAS. Have you encountered this problem?

dedura avatar Feb 05 '23 01:02 dedura

I'm experiencing the same issue with my DS920+. I have also returned NT-SS5G and got QNA-UC5G1T. Then I even got the recommended SABRENT hub with power adapter, but the issue still persists. One time my NAS restarted by itself, so that was bad. But usually is just loses connection and I need to restart the driver. Most of the time I can restart the driver but sometimes it's just impossible to do this.

jaqb avatar Mar 09 '23 11:03 jaqb

I have installed an older driver version "1.3.3.0-8 DSM-7.x. Working completely fine without a single crash or reboot since 25th February. See if that works for you.

dedura avatar Mar 09 '23 11:03 dedura

Thanks. I have downgraded to 1.3.3.0-8. I kind of know what to do to make the driver crash so I'll test it out.

jaqb avatar Mar 09 '23 12:03 jaqb

Nope, already had 2 improper shutdowns. Downgrading does not fix the issue for me.

jaqb avatar Mar 10 '23 21:03 jaqb

Same here, just crashed the whole system, rebooted and started Data Scrubbing. I went back to the 2.5G Adapter now.

dedura avatar Mar 15 '23 12:03 dedura

OK, now my 2.5G Adapter crashes too with the latest "r8152" driver. As I mentioned in my initial post, I believe something got messed up after the DSM (3) update.

dedura avatar Mar 16 '23 09:03 dedura

I would love to hear from @bb-qq regarding this issue ? Is there a way I can help to pinpoint the problem ?

jaqb avatar Mar 16 '23 09:03 jaqb

I am wondering how much traffic is flowing through the adapter before it becomes unstable. Heat might be causing the problem.

If you plugged that adapter into a Windows PC and kept the same amount of traffic flowing through it, would it work stably for an extended period of time?

bb-qq avatar Mar 18 '23 03:03 bb-qq

I am also curious as to how much memory you have in your NAS.

The versions of the driver discussed in this thread include changes in kernel parameters related to memory, so it is possible that those changes are causing the problem.

bb-qq avatar Mar 18 '23 03:03 bb-qq

I got 16GB Memory installed (2x 8GB) from Crucial. Traffic does not seem to be an issue for me as the driver randomly crashes even when transferring some photos or multiple documents. Another scenario, when I open Surveillance Station on my PC or backup using Synology Drive, then the driver randomly crashes too. I have tried the adapter on Windows 10 & 11 and copied multiple GB files without any issues, didn't crash.

The changes in Kernel Parameters could be true as the issue started with the DSM Update 3. Is there a fix for it?

dedura avatar Mar 18 '23 16:03 dedura

I've got 20GB of RAM (4+16). I also don't think it's about the amount of traffic and temperature but I can't be 100% sure. For me crashes happen when I do something with webdav and plex. Like streaming from webdav server. But sometimes also just refreshing the metadata on plex. The only thing I can say about the temperature is that one time when it crashed I have touched the casing of QNA-UC5G1T and it was just barely warm. Is there a way to check the internal temperature of QNA-UC5G1T ? I do have both "Low Power 5G" and "Thermal throttling" set to ON to make sure the temperature is in check.

jaqb avatar Mar 18 '23 21:03 jaqb

The changes in Kernel Parameters could be true as the issue started with the DSM Update 3. Is there a fix for it?

I was mentioning the changes on the driver's side. (https://github.com/bb-qq/aqc111/issues/96#issuecomment-1461841186) I don't know the details of the changes on the DSM side.

Is there a way to check the internal temperature of QNA-UC5G1T ?

As far as I know, there is no way to know the internal temperature. The only measure I can think of is to place it in a well-ventilated area and see the difference. (I saw a post once that said removing the case and installing a fan stabilized it, but I think it would be risky to go that far.)

bb-qq avatar Mar 21 '23 01:03 bb-qq

I don't have any ideas to investigate the cause, but since your NAS seems to have much memory, could you try doubling the value of target_value with the /var/packages/aqc111/scripts/apply-memory-setting, although it is unlikely to improve the situation?

bb-qq avatar Mar 21 '23 01:03 bb-qq

Thanks for your reply @bb-qq I have now doubled the target value and restarted the NAS. Will test it out and provide feedback.

`root@:/var/packages/aqc111/scripts# cat apply-memory-setting #!/bin/sh

set -eu

target_value=524288 current_value=sysctl -n vm.min_free_kbytes if [ "${current_value}" -lt "${target_value}" ] then sysctl -w vm.min_free_kbytes=${target_value} fi root@:/var/packages/aqc111/scripts# vim apply-memory-setting root@:/var/packages/aqc111/scripts# cat apply-memory-setting #!/bin/sh

set -eu

target_value=1048576 current_value=sysctl -n vm.min_free_kbytes if [ "${current_value}" -lt "${target_value}" ] then sysctl -w vm.min_free_kbytes=${target_value} fi root@:/var/packages/aqc111/scripts#`

dedura avatar Mar 21 '23 10:03 dedura

Hi @bb-qq - Whole NAS crashed in the morning. I turned the PC on and opened a file (Excel spreadsheet) via SMB, the adapter itself was cold, not even slightly warm and it crashed the whole NAS and rebooted. Upon boot, it started data scrubbing on the volume. Also want to mention, I ran the Memory Test via Synology Assistant last night and it passed without any errors. No idea what else I can do to troubleshoot.

Since you have the same Synology model, have you not encountered any of these issues yourself? Do you mind me asking what your specs are, i.e. Memory (Official/Unofficial), DSM version, NIC on the PC and the driver version of that. Not sure if my PC's NIC driver is probably causing these crashes. I am using the latest driver from Marvell (v3.1.7.0)

dedura avatar Mar 22 '23 12:03 dedura

Since you have the same Synology model, have you not encountered any of these issues yourself?

I have experienced a few times a year when I did not have low power mode enabled on a device that the device would stop responding and I would have to reload the driver. However, I have never experienced a NAS crash.

Do you mind me asking what your specs are, i.e. Memory (Official/Unofficial), DSM version, NIC on the PC and the driver version of that.

My environment is as follows:

  • Memory: Unofficial
$ sudo dmidecode --type memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0023, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: None
        Maximum Capacity: 16 GB
        Error Information Handle: No Error
        Number Of Devices: 2

Handle 0x0024, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0023
        Error Information Handle: No Error
        Total Width: 8 bits
        Data Width: 8 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: ChannelA-DIMM0
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous
        Speed: 1600 MT/s
        Manufacturer: Samsung
        Serial Number: 35701618
        Asset Tag: 9876543210
        Part Number: M471B1G73BH0-YK0
        Rank: Unknown
        Configured Memory Speed: 1600 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: Unknown

Handle 0x0025, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0023
        Error Information Handle: No Error
        Total Width: 8 bits
        Data Width: 8 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: ChannelB-DIMM0
        Bank Locator: BANK 1
        Type: DDR3
        Type Detail: Synchronous
        Speed: 1600 MT/s
        Manufacturer: Samsung
        Serial Number: 35701618
        Asset Tag: 9876543210
        Part Number: M471B1G73BH0-YK0
        Rank: Unknown
        Configured Memory Speed: 1600 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: Unknown
  • DSM version: 7.1.1-42962 Update 4
$ cat /etc/VERSION
majorversion="7"
minorversion="1"
major="7"
minor="1"
micro="1"
productversion="7.1.1"
buildphase="GM"
buildnumber="42962"
smallfixnumber="4"
nano="4"
base="42962"
builddate="2023/02/01"
buildtime="20:01:57"
  • QNA-UC5G1T FW version: 3.1.6 (latest FW on the QNAP website)
  • Connected USB port: front port with a stock cable
  • PC NIC: AQN-107 (direct connection)
  • PC NIC Driver: 2.2.3.0

bb-qq avatar Mar 22 '23 13:03 bb-qq

Thank you - the specs look nearly identical to mine. The last option I could try is to update to the DSM 7.2 BETA version and see if that makes any difference. It would be great if you can provide an updated driver that will work with the 7.2 Beta. Thanks

dedura avatar Mar 22 '23 16:03 dedura

I created drivers for the DSM 7.2 BETA, but I think it is unlikely that the DSM update will improve symptoms. https://github.com/bb-qq/aqc111/releases/tag/1.3.3.0-11

I wish I could at least find the cause of the reboot....

bb-qq avatar Mar 23 '23 09:03 bb-qq

Thank you @bb-qq , appreciated. I have also ordered 2x 4GB Memory, which is the maximum supported Memory as per Intel's website for the INTEL Celeron J3455. Some users claim it won't utilise anything above 8GB or if it tries, the system crashes, so let me find out if this makes any difference. If you require any system outputs/logs from me, please let me know.

dedura avatar Mar 23 '23 09:03 dedura

bb-qq already said he also has 2x8GB so I don't think that's it. I'm currently testing something and it's looking good. I'm going to stay with 1.3.3.0-10 while I test my thing. Btw how full is your system partition ( /dev/md0) ? df -h

jaqb avatar Mar 23 '23 10:03 jaqb

@jaqb - Here you go. Looking forward to hearing about your test results. Does this look right?

root@:~# df -h /dev/md0 Filesystem Size Used Avail Use% Mounted on /dev/md0 2.3G 1.9G 365M 84% /

dedura avatar Mar 23 '23 15:03 dedura

@jaqb - Just wondering, do you use your M.2 SSD as Cache or Volume? I had mine set up as volume for over a year and the aqc111 driver was installed on that volume (volume2) - Upon checking the log files (/var/log/messages), I found quite a few error messages related to volume2.

synostgvolume[840]: fs_btrfs_metadata_usage_query.c:137 Failed to check the btrfs metadata usage of volume [/volume2].

The above message is repeated multiple times. I have now removed volume2 and using it as a normal cache now. Also replaced the 16GB RAM with 2x 4GB. So far it runs stable, even booting/restarting the NAS is much faster than before. Will test and provide feedback.

dedura avatar Mar 25 '23 22:03 dedura

84% used seems about right. I have now 82% but I had 100% couple of days ago so I had a lot of weird issues. Had to delete a bunch of logs to get this low.

I use 2x m.2 ssd's as cache for read-write.

jaqb avatar Mar 25 '23 23:03 jaqb