BPI-Router-Linux icon indicating copy to clipboard operation
BPI-Router-Linux copied to clipboard

Network and filesystem unstable

Open Josue-T opened this issue 5 years ago • 16 comments

Since the version ~4.14.81-82 I have some some unstability.

About the network I get at the boot some errors like that:

Apr 15 21:01:38 the_hostname kernel: [   45.246392] Hardware name: Mediatek Cortex-A7 (Device Tree)
Apr 15 21:01:38 the_hostname kernel: [   45.246436] [<c01132d0>] (unwind_backtrace) from [<c010d5e0>] (show_stack+0x20/0x24)
Apr 15 21:01:38 the_hostname kernel: [   45.246453] [<c010d5e0>] (show_stack) from [<c0aa8544>] (dump_stack+0x90/0xa4)
Apr 15 21:01:38 the_hostname kernel: [   45.246465] [<c0aa8544>] (dump_stack) from [<c023c46c>] (warn_alloc+0xc8/0x17c)
Apr 15 21:01:38 the_hostname kernel: [   45.246476] [<c023c46c>] (warn_alloc) from [<c023d7bc>] (__alloc_pages_nodemask+0x11e8/0x122c)
Apr 15 21:01:38 the_hostname kernel: [   45.246485] [<c023d7bc>] (__alloc_pages_nodemask) from [<c023d9ec>] (page_frag_alloc+0x14c/0x160)
Apr 15 21:01:38 the_hostname kernel: [   45.246497] [<c023d9ec>] (page_frag_alloc) from [<c089240c>] (netdev_alloc_frag+0x40/0x48)
Apr 15 21:01:38 the_hostname kernel: [   45.246511] [<c089240c>] (netdev_alloc_frag) from [<c0717ab0>] (mtk_rx_alloc+0xac/0x34c)
Apr 15 21:01:38 the_hostname kernel: [   45.246522] [<c0717ab0>] (mtk_rx_alloc) from [<c07180f4>] (mtk_open+0x3a4/0x5f8)
Apr 15 21:01:38 the_hostname kernel: [   45.246532] [<c07180f4>] (mtk_open) from [<c08aefe4>] (__dev_open+0xb8/0x11c)
Apr 15 21:01:38 the_hostname kernel: [   45.246543] [<c08aefe4>] (__dev_open) from [<c08af3a8>] (__dev_change_flags+0x184/0x1ac)
Apr 15 21:01:38 the_hostname kernel: [   45.246553] [<c08af3a8>] (__dev_change_flags) from [<c08af3f8>] (dev_change_flags+0x28/0x58)
Apr 15 21:01:38 the_hostname kernel: [   45.246564] [<c08af3f8>] (dev_change_flags) from [<c08c2658>] (do_setlink+0x32c/0x968)
Apr 15 21:01:38 the_hostname kernel: [   45.246575] [<c08c2658>] (do_setlink) from [<c08c4898>] (rtnl_newlink+0x4d0/0x7fc)
Apr 15 21:01:38 the_hostname kernel: [   45.246585] [<c08c4898>] (rtnl_newlink) from [<c08c1ae8>] (rtnetlink_rcv_msg+0x218/0x280)
Apr 15 21:01:38 the_hostname kernel: [   45.246595] [<c08c1ae8>] (rtnetlink_rcv_msg) from [<c08eaef8>] (netlink_rcv_skb+0xd0/0xf0)
Apr 15 21:01:38 the_hostname kernel: [   45.246606] [<c08eaef8>] (netlink_rcv_skb) from [<c08c18cc>] (rtnetlink_rcv+0x20/0x24)
Apr 15 21:01:38 the_hostname kernel: [   45.246615] [<c08c18cc>] (rtnetlink_rcv) from [<c08ea648>] (netlink_unicast+0x190/0x258)
Apr 15 21:01:38 the_hostname kernel: [   45.246624] [<c08ea648>] (netlink_unicast) from [<c08eaa68>] (netlink_sendmsg+0x280/0x350)
Apr 15 21:01:38 the_hostname kernel: [   45.246634] [<c08eaa68>] (netlink_sendmsg) from [<c088a024>] (sock_sendmsg+0x24/0x34)
Apr 15 21:01:38 the_hostname kernel: [   45.246642] [<c088a024>] (sock_sendmsg) from [<c088a7e4>] (___sys_sendmsg+0x1ec/0x214)
Apr 15 21:01:38 the_hostname kernel: [   45.246651] [<c088a7e4>] (___sys_sendmsg) from [<c088b674>] (__sys_sendmsg+0x50/0x7c)
Apr 15 21:01:38 the_hostname kernel: [   45.246660] [<c088b674>] (__sys_sendmsg) from [<c088b6b8>] (SyS_sendmsg+0x18/0x1c)
Apr 15 21:01:38 the_hostname kernel: [   45.246671] [<c088b6b8>] (SyS_sendmsg) from [<c0108cc0>] (ret_fast_syscall+0x0/0x54)

And about the (ext4) filesystem (with the sata port) for some file when I try to read it I get this:

ls: impossible d'ouvrir le répertoire '/usr/local/lib/python3.5/dist-packages/pip': Ne peut allouer de la mémoire
d????????? ? ?    ?        ?            ? pip

And in the kernel log I get this:

Apr 15 21:08:03 the_hostname kernel: [  399.051628] grow_buffers: requested out-of-range block 16879718369374422450 for device sda1

With the all new kernel version (4.14 and 4.19) I have theses error. The only version wich run cleanly is <= 4.14.67. Since the version 4.14.74 I detected some issue about the network. And between the version 4.14.81 and the version 4.14.82 I get the filesystem problem.

I think that it's due to a service which generate this unstability because with a clean rootfs I don't get theses problem.

I give you my boot sequence (of a unstable kernel version if it could help you):

[    5.181665] mtk-msdc 11230000.mmc: GPIO lookup for consumer wp
[    5.187468] mtk-msdc 11230000.mmc: using device tree for GPIO lookup
[    5.193812] of_get_named_gpiod_flags: can't parse 'wp-gpios' property of node '/mmc@11230000[0]'
[    5.202561] of_get_named_gpiod_flags: can't parse 'wp-gpio' property of node '/mmc@11230000[0]'
[    5.211217] mtk-msdc 11230000.mmc: using lookup tables for GPIO lookup
[    5.217700] mtk-msdc 11230000.mmc: lookup for GPIO wp failed
[    5.267609] mmc0: host does not support reading read-only switch, assuming write-enable
[    5.277724] mmc0: new high speed SDHC card at address 0001
[    5.284266] of_get_named_gpiod_flags: parsed 'gpios' property of node '/leds/green[0]' - status (0)
[    5.293314] no flags found for gpios
[    5.296975] of_get_named_gpiod_flags: parsed 'gpios' property of node '/leds/blue[0]' - status (0)
[    5.305909] no flags found for gpios
[    5.309554] of_get_named_gpiod_flags: parsed 'gpios' property of node '/leds/red[0]' - status (0)
[    5.318396] no flags found for gpios
[    5.322373] mmcblk0: mmc0:0001 00000 29.3 GiB 
[    5.328881]  mmcblk0: p1 p2 p3
[    5.332415] hidraw: raw HID events driver (C) Jiri Kosina
[    5.337918] usbcore: registered new interface driver usbhid
[    5.343479] usbhid: USB HID core driver
[    5.348406] mtk-iommu-v1 10205000.mmsys_iommu: bound 14010000.larb (ops mtk_smi_larb_component_ops)
[    5.357426] mtk-iommu-v1 10205000.mmsys_iommu: bound 16010000.larb (ops mtk_smi_larb_component_ops)
[    5.366431] mtk-iommu-v1 10205000.mmsys_iommu: bound 15001000.larb (ops mtk_smi_larb_component_ops)
[    5.376452] NET: Registered protocol family 17
[    5.381428] NET: Registered protocol family 41
[    5.385959] 8021q: 802.1Q VLAN Support v1.8
[    5.386088] mmc1: new high speed MMC card at address 0001
[    5.386638] mmcblk1: mmc1:0001 8WPD3R 7.28 GiB 
[    5.386987] mmcblk1boot0: mmc1:0001 8WPD3R partition 1 4.00 MiB
[    5.387309] mmcblk1boot1: mmc1:0001 8WPD3R partition 2 4.00 MiB
[    5.387631] mmcblk1rpmb: mmc1:0001 8WPD3R partition 3 512 KiB
[    5.417890] ThumbEE CPU extension supported.
[    5.422157] Registering SWP/SWPB emulation handler
[    5.541905] mtk-pcie 1a140000.pcie-controller: Port0 link down
[    5.578190] mtk-pcie 1a140000.pcie-controller: PCI host bridge to bus 0000:00
[    5.585314] pci_bus 0000:00: root bus resource [io  0x0000-0xffff] (bus address [0x1a160000-0x1a16ffff])
[    5.594747] pci_bus 0000:00: root bus resource [mem 0x60000000-0x6fffffff]
[    5.601591] pci_bus 0000:00: root bus resource [bus 00-ff]
[    5.607071] pci 0000:00:01.0: [14c3:0801] type 01 class 0x060400
[    5.613079] pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x0000ffff]
[    5.619359] pci 0000:00:01.0: supports D1
[    5.623357] pci 0000:00:01.0: PME# supported from D0 D1 D3hot
[    5.629282] PCI: bus0: Fast back to back transfers disabled
[    5.634840] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    5.642938] pci 0000:01:00.0: [1b21:0611] type 00 class 0x010185
[    5.648930] pci 0000:01:00.0: reg 0x10: initial BAR value 0x00000000 invalid
[    5.655950] pci 0000:01:00.0: reg 0x10: [io  size 0x0008]
[    5.661334] pci 0000:01:00.0: reg 0x14: initial BAR value 0x00000000 invalid
[    5.668335] pci 0000:01:00.0: reg 0x14: [io  size 0x0004]
[    5.673716] pci 0000:01:00.0: reg 0x18: initial BAR value 0x00000000 invalid
[    5.680728] pci 0000:01:00.0: reg 0x18: [io  size 0x0008]
[    5.686098] pci 0000:01:00.0: reg 0x1c: initial BAR value 0x00000000 invalid
[    5.693114] pci 0000:01:00.0: reg 0x1c: [io  size 0x0004]
[    5.698483] pci 0000:01:00.0: reg 0x20: initial BAR value 0x00000000 invalid
[    5.705496] pci 0000:01:00.0: reg 0x20: [io  size 0x0010]
[    5.710880] pci 0000:01:00.0: reg 0x24: [mem 0x00000000-0x000001ff]
[    5.717111] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
[    5.750754] PCI: bus1: Fast back to back transfers disabled
[    5.756294] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    5.762913] pci 0000:00:01.0: BAR 8: assigned [mem 0x60000000-0x600fffff]
[    5.769658] pci 0000:00:01.0: BAR 9: assigned [mem 0x60100000-0x601fffff pref]
[    5.776851] pci 0000:00:01.0: BAR 1: assigned [mem 0x60200000-0x6020ffff]
[    5.783611] pci 0000:00:01.0: BAR 7: assigned [io  0x1000-0x1fff]
[    5.789670] pci 0000:01:00.0: BAR 6: assigned [mem 0x60100000-0x6010ffff pref]
[    5.796861] pci 0000:01:00.0: BAR 5: assigned [mem 0x60000000-0x600001ff]
[    5.803622] pci 0000:01:00.0: BAR 4: assigned [io  0x1000-0x100f]
[    5.809678] pci 0000:01:00.0: BAR 0: assigned [io  0x1010-0x1017]
[    5.815746] pci 0000:01:00.0: BAR 2: assigned [io  0x1018-0x101f]
[    5.821817] pci 0000:01:00.0: BAR 1: assigned [io  0x1020-0x1023]
[    5.827873] pci 0000:01:00.0: BAR 3: assigned [io  0x1024-0x1027]
[    5.833943] pci 0000:00:01.0: PCI bridge to [bus 01]
[    5.838875] pci 0000:00:01.0:   bridge window [io  0x1000-0x1fff]
[    5.844944] pci 0000:00:01.0:   bridge window [mem 0x60000000-0x600fffff]
[    5.851700] pci 0000:00:01.0:   bridge window [mem 0x60100000-0x601fffff pref]
[    5.859028] pcieport 0000:00:01.0: enabling device (0140 -> 0143)
[    5.865272] pcieport 0000:00:01.0: Signaling PME with IRQ 221
[    5.871345] ahci 0000:01:00.0: version 3.0
[    5.875427] ahci 0000:01:00.0: enabling device (0140 -> 0143)
[    5.881235] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[    5.887850] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl IDE mode
[    5.895827] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs 
[    5.905731] scsi host0: ahci
[    5.908996] scsi host1: ahci
[    5.912097] ata1: SATA max UDMA/133 abar m512@0x60000000 port 0x60000100 irq 221
[    5.919446] ata2: SATA max UDMA/133 abar m512@0x60000000 port 0x60000180 irq 221
[    5.927202] mediatek-dpi 14014000.dpi: Found bridge node: /hdmi@14015000
[    5.934133] mediatek-drm 14000000.dispsys: bound 14007000.ovl (ops mtk_disp_ovl_component_ops)
[    5.942724] mediatek-drm 14000000.dispsys: bound 14008000.rdma (ops mtk_disp_rdma_component_ops)
[    5.951473] mediatek-drm 14000000.dispsys: bound 1400b000.color (ops mtk_disp_color_component_ops)
[    5.960377] mediatek-drm 14000000.dispsys: bound 14012000.rdma (ops mtk_disp_rdma_component_ops)
[    5.969143] mediatek-drm 14000000.dispsys: bound 14014000.dpi (ops mtk_dpi_component_ops)
[    5.977363] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    5.983948] [drm] No driver support for vblank timestamp query.
[    5.989864] [drm] Cannot find any crtc or sizes
[    5.994849] [drm] Initialized mediatek 1.0.0 20150513 for 14000000.dispsys on minor 0
[    6.002991] mt7530 switch@0: GPIO lookup for consumer reset
[    6.008532] mt7530 switch@0: using device tree for GPIO lookup
[    6.014366] of_get_named_gpiod_flags: parsed 'reset-gpios' property of node '/switch@0[0]' - status (0)
[    6.023880] DSA: switch 0 0 parsed
[    6.027282] DSA: tree 0 parsed
[    6.037408] mdio_bus dsa-0.0: GPIO lookup for consumer reset
[    6.043053] mdio_bus dsa-0.0: using lookup tables for GPIO lookup
[    6.049105] mdio_bus dsa-0.0: lookup for GPIO reset failed
[    6.056200] libphy: dsa slave smi: probed
[    6.060408] Generic PHY dsa-0.0:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=dsa-0.0:00, irq=POLL)
[    6.070932] Generic PHY dsa-0.0:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=dsa-0.0:01, irq=POLL)
[    6.081367] Generic PHY dsa-0.0:02: attached PHY driver [Generic PHY] (mii_bus:phy_addr=dsa-0.0:02, irq=POLL)
[    6.091777] Generic PHY dsa-0.0:03: attached PHY driver [Generic PHY] (mii_bus:phy_addr=dsa-0.0:03, irq=POLL)
[    6.102176] Generic PHY dsa-0.0:04: attached PHY driver [Generic PHY] (mii_bus:phy_addr=dsa-0.0:04, irq=POLL)
[    6.112423] of_get_named_gpiod_flags: can't parse 'link-gpios' property of node '/switch@0/ports/port@5/fixed-link[0]'
[    6.125528] of_get_named_gpiod_flags: can't parse 'link-gpios' property of node '/switch@0/ports/port@6/fixed-link[0]'
[    6.138903] hctosys: unable to open rtc device (rtc0)
[    6.144159] clk: Not disabling unused clocks
[    6.148422] vdd_fixed_vgpu: disabling
[    6.152100] vusb: disabling
[    6.154883] vmc: disabling
[    6.157577] vgp1: disabling
[    6.160374] vcamaf: disabling
[    6.163351] ALSA device list:
[    6.166295]   No soundcards found.
[    6.420742] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    6.428445] ata1.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
[    6.434695] ata1.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[    6.443141] ata1.00: configured for UDMA/133
[    6.447918] scsi 0:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
[    6.456867] sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    6.464588] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    6.469850] sd 0:0:0:0: [sda] Write Protect is off
[    6.474627] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    6.479750] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    6.542848]  sda: sda1 sda2 sda3 sda4 sda5 sda6
[    6.549238] sd 0:0:0:0: [sda] Attached SCSI disk
[    6.782204] ata2: SATA link down (SStatus 0 SControl 300)
[    6.787676] md: Waiting for all devices to be available before autodetect
[    6.794453] md: If you don't use raid, use raid=noautodetect
[    6.800677] md: Autodetecting RAID arrays.
[    6.804770] md: autorun ...
[    6.807543] md: ... autorun DONE.
[    6.915266] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[    6.922968] VFS: Mounted root (ext4 filesystem) readonly on device 8:1.
[    6.971125] devtmpfs: mounted
[    6.976385] Freeing unused kernel memory: 1024K
[    7.607379] random: fast init done
[    7.841106] systemd[1]: System time before build time, advancing clock.
[    8.166953] NET: Registered protocol family 10
[    8.172978] Segment Routing with IPv6
[    8.193136] ip_tables: (C) 2000-2006 Netfilter Core Team
[    8.251228] random: systemd: uninitialized urandom read (16 bytes read)
[    8.276106] random: systemd: uninitialized urandom read (16 bytes read)
[    8.287706] systemd[1]: systemd 232 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTI)
[    8.306348] systemd[1]: Detected architecture arm.

Welcome to Debian GNU/Linux 9 (stretch)!

[    8.404847] systemd[1]: Set hostname to <the_hostname>.
[    8.561300] random: systemd-cryptse: uninitialized urandom read (16 bytes read)
[   10.721764] systemd[1]: lvm2-lvmetad.socket: Cannot add dependency job, ignoring: Unit lvm2-lvmetad.socket is masked.
[   10.732416] systemd[1]: lvm2-monitor.service: Cannot add dependency job, ignoring: Unit lvm2-monitor.service is masked.
[   10.743578] systemd[1]: lvm2-activation.service: Cannot add dependency job, ignoring: Unit lvm2-activation.service is masked.
[   10.754894] systemd[1]: lvm2-activation-early.service: Cannot add dependency job, ignoring: Unit lvm2-activation-early.service is masked.
[   10.767808] systemd[1]: dm-event.socket: Cannot add dependency job, ignoring: Unit dm-event.socket is masked.
[   10.778482] systemd[1]: nscd.service: Cannot add dependency job, ignoring: Unit nscd.service is masked.
[   10.796684] systemd[1]: Listening on udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on fsck to fsckd communication Socket.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Syslog Socket.
[  OK  ] Created slice System Slice.
[  OK  ] Created slice system-getty.slice.
         Mounting POSIX Message Queue File System...
         Starting File System Check on Root Device...
[  OK  ] Created slice system-uwsgi\x2dapp.slice.
[  OK  ] Created slice system-postfix.slice.
[  OK  ] Created slice system-serial\x2dgetty.slice.
         Starting Load Kernel Modules...
[  OK  ] Reached target Slices.
[  OK  ] Created slice system-postgresql.slice.
[  OK  ] Created slice system-openvpn\x2dclient.slice.
         Starting Nameserver information manager...
         Mounting Debug File System...
         Starting Set the console keyboard layout...
[  OK  ] Created slice system-openvpn\x2dserver.slice.
[   11.601818] cryptodev: loading out-of-tree module taints kernel.
[   11.609328] cryptodev: driver 1.9 loaded.
         Starting Restore / save the current clock...
[   11.853487] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
         Starting Journal Service...
[  OK  ] Created slice system-systemd\x2dfsck.slice.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Encrypted Volumes.
         Starting Create list of required st�…ce nodes for the current kernel...
[   12.022288] ip6_tables: (C) 2000-2006 Netfilter Core Team
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Debug File System.
[  OK  ] Started Restore / save the current clock.
[  OK  ] Started Create list of required sta�…vice nodes for the current kernel.
[  OK  ] Started Set the console keyboard layout.
[  OK  ] Started Nameserver information manager.
[  OK  ] Started Load Kernel Modules.
[  OK  ] Started Journal Service.
[  OK  ] Started File System Check Daemon to report status.
         Starting Apply Kernel Variables...
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Create Static Device Nodes in /dev.
         Starting udev Kernel Device Manager...
[  OK  ] Started udev Kernel Device Manager.e)
[   17.511642] random: crng init done complete)
[   17.515055] random: 7 urandom warning(s) missed due to ratelimiting
[  OK  ] Started File System Check on Root Device.
         Starting Remount Root and Kernel File Systems...
[   37.885033] EXT4-fs (sda1): warning: checktime reached, running e2fsck is recommended
[   37.912735] EXT4-fs (sda1): re-mounted. Opts: (null)
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting udev Coldplug all Devices...
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Started udev Coldplug all Devices.
[   38.511972] evbug: Connected device: input0 (mtk_cir at mtk_cir/input0)
[   38.513010] mt6397-rtc mt6323-rtc: registered as rtc0
[   38.523667] evbug: Connected device: input1 (MCE IR Keyboard/Mouse (mtk_cir) at /input0)
[  OK  ] Found device /dev/ttyS0.
[   38.558423] mtk-thermal 1100b000.thermal: Device not calibrated, using default calibration values
[  OK  ] Found device ST4000VN008-2DR166 swap.
[  OK  ] Found device ST4000VN008-2DR166 swap.
[  OK  ] Found device ST4000VN008-2DR166 home.
[  OK  ] Found device ST4000VN008-2DR166 tmp.
[  OK  ] Found device ST4000VN008-2DR166 var.
         Starting File System Check on /dev/�…bd4-f729-4f5d-babf-c7bedcafdda1...
         Starting File System Check on /dev/�…1dd-0591-42fc-bb1e-94f7eb7a8ec7...
         Starting File System Check on /dev/�…517-5eb7-456e-8020-797137bcdbca...
         Activating swap Swap Partition...
[  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
[   40.364399] Adding 16383996k swap on /dev/sda2.  Priority:-2 extents:1 across:16383996k 
[  OK  ] Activated swap Swap Partition.
[  OK  ] Activated swap /dev/disk/by-uuid/0db86b34-3df2-448f-9ca8-b1cc7255dd65.
[  OK  ] Reached target Swap.
[  OK  ] Started File System Check on /dev/d�…4ebd4-f729-4f5d-babf-c7bedcafdda1.
         Mounting /var...
[  OK  ] Started File System Check on /dev/d�…d1517-5eb7-456e-8020-797137bcdbca.
[   40.721858] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
[  OK  ] Mounted /var.
         Starting Flush Journal to Persistent Storage...
         Starting Load/Save Random Seed...
         Mounting /home...
[  OK  ] Started File System Check on /dev/d�…481dd-0591-42fc-bb1e-94f7eb7a8ec7.
[  OK  ] Started Load/Save Random Seed. complete)
         Mounting /tmp...
[   41.027244] systemd-journald[169]: Received request to flush runtime journal from PID 1
[   41.144274] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Mounted /tmp.
[   41.524180] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)
[  OK  ] Mounted /home.
[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
         Starting Set console font and keymap...
         Starting Raise network interfaces...
         Starting Enable support for additional executable binary formats...
[  OK  ] Started Set console font and keymap.
[  OK  ] Started Enable support for additional executable binary formats.
[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Started Entropy daemon using the HAVEGE algorithm.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
         Starting Console System Startup Logging...
[  OK  ] Listening on PC/SC Smart Card Daemon Activation Socket.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Started Clean PHP session files every 30 mins.
[  OK  ] Listening on Avahi mDNS/DNS-SD Stack Activation Socket.
[  OK  ] Listening on fcgiwrap Socket.
[  OK  ] Listening on ACPID Listen Socket.
[  OK  ] Started Daily apt download activities.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Reached target Timers.
[  OK  ] Started ACPI Events Check.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Basic System.
         Starting LSB: Start/stop sysstat's sadc...
         Starting Avahi mDNS/DNS-SD Stack...
         Starting Internet superserver...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started PostSRSd Daemon.
         Starting LSB: Load kernel modules needed to enable cpufreq scaling...
         Starting LSB: Start or stop stunnel��…TLS tunnel for network daemons)...
[  OK  ] Started YunoHost boot prompt.
[  OK  ] Started Name Service Cache Daemon.
         Starting LSB: service and resource monitoring daemon...
         Starting LSB: Provide limited super�…er privileges to specific users...
         Starting LSB: Start the GNUstep distributed object mapper...
         Starting PostgreSQL Cluster 9.6-main...
         Starting Initialize hardware monitoring sensors...
         Starting Login Service...
         Starting LSB: Start Monitorix daemon...
[  OK  ] Started Self Monitoring and Reporting Technology (SMART) Daemon.
         Starting System Logging Service...
         Starting LSB: Start/stop the postgrey daemon...
[  OK  ] Started ACPI event daemon.
         Starting LSB: Demon mouse...
         Starting LSB: Add IP and MAC in /etc/issue...
[  OK  ] Started Internet superserver.
[  OK  ] Started Console System Startup Logging.
[  OK  ] Started Initialize hardware monitoring sensors.
[   44.478214] ip: page allocation failure: order:0, mode:0x1080120(GFP_ATOMIC|__GFP_COLD), nodemask=(null)
[   44.487689] ip cpuset=/ mems_allowed=0
[   44.491438] CPU: 3 PID: 668 Comm: ip Tainted: G           O    4.14.78-bpi-r2-main #1
[   44.491441] Hardware name: Mediatek Cortex-A7 (Device Tree)
[   44.491474] [<c01132d0>] (unwind_backtrace) from [<c010d5e0>] (show_stack+0x20/0x24)
[   44.491486] [<c010d5e0>] (show_stack) from [<c0aa7f44>] (dump_stack+0x90/0xa4)
[   44.491499] [<c0aa7f44>] (dump_stack) from [<c023c37c>] (warn_alloc+0xc8/0x17c)
[   44.491509] [<c023c37c>] (warn_alloc) from [<c023d6cc>] (__alloc_pages_nodemask+0x11e8/0x122c)
[   44.491518] [<c023d6cc>] (__alloc_pages_nodemask) from [<c023d8fc>] (page_frag_alloc+0x14c/0x160)
[   44.491529] [<c023d8fc>] (page_frag_alloc) from [<c089204c>] (netdev_alloc_frag+0x40/0x48)
[   44.491542] [<c089204c>] (netdev_alloc_frag) from [<c0717748>] (mtk_rx_alloc+0xac/0x34c)
[   44.491552] [<c0717748>] (mtk_rx_alloc) from [<c0717d8c>] (mtk_open+0x3a4/0x5f8)
[   44.491562] [<c0717d8c>] (mtk_open) from [<c08aec14>] (__dev_open+0xb8/0x11c)
[   44.491573] [<c08aec14>] (__dev_open) from [<c08aefd8>] (__dev_change_flags+0x184/0x1ac)
[   44.491582] [<c08aefd8>] (__dev_change_flags) from [<c08af028>] (dev_change_flags+0x28/0x58)
[   44.491592] [<c08af028>] (dev_change_flags) from [<c08c2270>] (do_setlink+0x32c/0x968)
[   44.491602] [<c08c2270>] (do_setlink) from [<c08c446c>] (rtnl_newlink+0x4d0/0x7fc)
[   44.491612] [<c08c446c>] (rtnl_newlink) from [<c08c1700>] (rtnetlink_rcv_msg+0x218/0x280)
[   44.491622] [<c08c1700>] (rtnetlink_rcv_msg) from [<c08eaab4>] (netlink_rcv_skb+0xd0/0xf0)
[   44.491632] [<c08eaab4>] (netlink_rcv_skb) from [<c08c14e4>] (rtnetlink_rcv+0x20/0x24)
[   44.491640] [<c08c14e4>] (rtnetlink_rcv) from [<c08ea204>] (netlink_unicast+0x190/0x258)
[   44.491649] [<c08ea204>] (netlink_unicast) from [<c08ea624>] (netlink_sendmsg+0x280/0x350)
[   44.491661] [<c08ea624>] (netlink_sendmsg) from [<c0889c64>] (sock_sendmsg+0x24/0x34)
[   44.491671] [<c0889c64>] (sock_sendmsg) from [<c088a424>] (___sys_sendmsg+0x1ec/0x214)
[   44.491680] [<c088a424>] (___sys_sendmsg) from [<c088b2b4>] (__sys_sendmsg+0x50/0x7c)
[   44.491688] [<c088b2b4>] (__sys_sendmsg) from [<c088b2f8>] (SyS_sendmsg+0x18/0x1c)
[   44.491698] [<c088b2f8>] (SyS_sendmsg) from [<c0108cc0>] (ret_fast_syscall+0x0/0x54)
[   44.491702] Mem-Info:
[   44.491718] active_anon:3793 inactive_anon:2378 isolated_anon:0
[   44.491718]  active_file:20127 inactive_file:101813 isolated_file:0
[   44.491718]  unevictable:0 dirty:8 writeback:0 unstable:0
[   44.491718]  slab_reclaimable:4101 slab_unreclaimable:2982
[   44.491718]  mapped:3207 shmem:2429 pagetables:224 bounce:0
[   44.491718]  free:370395 free_pcp:595 free_cma:15230
[   44.491729] Node 0 active_anon:15172kB inactive_anon:9512kB active_file:80508kB inactive_file:407252kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:12828kB dirty:32kB writeo
[   44.491745] Normal free:4680kB min:2812kB low:3512kB high:4212kB active_anon:0kB inactive_anon:0kB active_file:65556kB inactive_file:371832kB unevictable:0kB writepending:20kB present:5324B
[   44.491746] lowmem_reserve[]: 0 11703 11703
[   44.491766] HighMem free:1476900kB min:512kB low:2636kB high:4760kB active_anon:15172kB inactive_anon:9512kB active_file:14952kB inactive_file:35420kB unevictable:0kB writepending:12kB preB
[   44.491767] lowmem_reserve[]: 0 0 0
[   44.491774] Normal: 146*4kB (UME) 31*8kB (UME) 27*16kB (UE) 21*32kB (UM) 11*64kB (UE) 4*128kB (U) 2*256kB (U) 2*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB = 4688kB
[   44.491810] HighMem: 10*4kB (UMC) 25*8kB (UM) 13*16kB (UMC) 13*32kB (UMC) 3*64kB (UM) 4*128kB (UMC) 3*256kB (UMC) 0*512kB 2*1024kB (UC) 1*2048kB (C) 3*4096kB (UM) 178*8192kB (MC) = 1476896B
[   44.491850] 124367 total pagecache pages
[   44.491867] 0 pages in swap cache
[   44.491870] Swap cache stats: add 0, delete 0, find 0/0
[   44.491872] Free swap  = 16383996kB
[   44.491875] Total swap = 16383996kB
[   44.491877] 524031 pages RAM
[   44.491879] 390911 pages HighMem/MovableOnly
[   44.491881] 9059 pages reserved
[   44.491883] 16384 pages cma reserved
[   44.532616] ip: page allocation failure: order:10, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
[   44.532642] ip cpuset=/ mems_allowed=0
[   44.532659] CPU: 2 PID: 681 Comm: ip Tainted: G           O    4.14.78-bpi-r2-main #1
[   44.532662] Hardware name: Mediatek Cortex-A7 (Device Tree)
[   44.532692] [<c01132d0>] (unwind_backtrace) from [<c010d5e0>] (show_stack+0x20/0x24)
[   44.532704] [<c010d5e0>] (show_stack) from [<c0aa7f44>] (dump_stack+0x90/0xa4)
[   44.532717] [<c0aa7f44>] (dump_stack) from [<c023c37c>] (warn_alloc+0xc8/0x17c)
[   44.532727] [<c023c37c>] (warn_alloc) from [<c023d6cc>] (__alloc_pages_nodemask+0x11e8/0x122c)
[   44.532737] [<c023d6cc>] (__alloc_pages_nodemask) from [<c025f4d0>] (kmalloc_order+0x28/0x40)
[   44.532746] [<c025f4d0>] (kmalloc_order) from [<c025f514>] (kmalloc_order_trace+0x2c/0xd8)
[   44.532757] [<c025f514>] (kmalloc_order_trace) from [<c0717ad8>] (mtk_open+0xf0/0x5f8)
[   44.532770] [<c0717ad8>] (mtk_open) from [<c08aec14>] (__dev_open+0xb8/0x11c)
[   44.532781] [<c08aec14>] (__dev_open) from [<c08aefd8>] (__dev_change_flags+0x184/0x1ac)
[   44.532790] [<c08aefd8>] (__dev_change_flags) from [<c08af028>] (dev_change_flags+0x28/0x58)
[   44.532800] [<c08af028>] (dev_change_flags) from [<c08c2270>] (do_setlink+0x32c/0x968)
[   44.532811] [<c08c2270>] (do_setlink) from [<c08c446c>] (rtnl_newlink+0x4d0/0x7fc)
[   44.532820] [<c08c446c>] (rtnl_newlink) from [<c08c1700>] (rtnetlink_rcv_msg+0x218/0x280)
[   44.532830] [<c08c1700>] (rtnetlink_rcv_msg) from [<c08eaab4>] (netlink_rcv_skb+0xd0/0xf0)
[   44.532840] [<c08eaab4>] (netlink_rcv_skb) from [<c08c14e4>] (rtnetlink_rcv+0x20/0x24)
[   44.532849] [<c08c14e4>] (rtnetlink_rcv) from [<c08ea204>] (netlink_unicast+0x190/0x258)
[   44.532857] [<c08ea204>] (netlink_unicast) from [<c08ea624>] (netlink_sendmsg+0x280/0x350)
[   44.532870] [<c08ea624>] (netlink_sendmsg) from [<c0889c64>] (sock_sendmsg+0x24/0x34)
[   44.532880] [<c0889c64>] (sock_sendmsg) from [<c088a424>] (___sys_sendmsg+0x1ec/0x214)
[   44.532889] [<c088a424>] (___sys_sendmsg) from [<c088b2b4>] (__sys_sendmsg+0x50/0x7c)
[   44.532898] [<c088b2b4>] (__sys_sendmsg) from [<c088b2f8>] (SyS_sendmsg+0x18/0x1c)
[   44.532908] [<c088b2f8>] (SyS_sendmsg) from [<c0108cc0>] (ret_fast_syscall+0x0/0x54)
[   44.963310] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   44.982170] br0: port 1(lan0) entered blocking state
[   44.982183] br0: port 1(lan0) entered disabled state
[   44.982515] device lan0 entered promiscuous mode
[   44.982522] device eth0 entered promiscuous mode
[   44.998851] br0: port 2(lan1) entered blocking state
[   44.998861] br0: port 2(lan1) entered disabled state
[   44.999168] device lan1 entered promiscuous mode
[   45.012090] br0: port 3(lan2) entered blocking state
[   45.012100] br0: port 3(lan2) entered disabled state
[   45.012401] device lan2 entered promiscuous mode
[   45.026435] br0: port 4(lan3) entered blocking state
[   45.026444] br0: port 4(lan3) entered disabled state
[   45.026743] device lan3 entered promiscuous mode
[   45.054073] IPv6: ADDRCONF(NETDEV_UP): br0: link is not ready
[  OK  ] Started System Logging Service.
[  OK  ] Started LSB: Start/stop sysstat's sadc.
[  OK  ] Started LSB: Add IP and MAC in /etc/issue.
[  OK  ] Started LSB: Load kernel modules needed to enable cpufreq scaling.
[  OK  ] Started LSB: Start the GNUstep distributed object mapper.
[  OK  ] Started LSB: Provide limited super user privileges to specific users.
[  OK  ] Started LSB: Demon mouse.
[  OK  ] Started Avahi mDNS/DNS-SD Stack.
         Starting LSB: set CPUFreq kernel parameters...
[  OK  ] Started Login Service.
[  OK  ] Started LSB: service and resource monitoring daemon.
[  OK  ] Started LSB: set CPUFreq kernel parameters.
[FAILED] Failed to start Raise network interfaces.
See 'systemctl status networking.service' for details.
[  OK  ] Reached target Network.

Josue-T avatar Apr 15 '19 20:04 Josue-T

i did not see this network-instability before (have also 4.14 running and look daily in my dmesg)... crash seem to happen in mtk_open

strange is this callstack (seems there is a bit of compiler optimization):

Apr 15 21:01:38 the_hostname kernel: [   45.246522] [<c0717ab0>] (mtk_rx_alloc) from [<c07180f4>] (mtk_open+0x3a4/0x5f8)
Apr 15 21:01:38 the_hostname kernel: [   45.246532] [<c07180f4>] (mtk_open) from [<c08aefe4>] (__dev_open+0xb8/0x11c)

because in mtk_open there is no call to mtk_rx_alloc

https://github.com/frank-w/BPI-R2-4.14/blob/4.14-main/drivers/net/ethernet/mediatek/mtk_eth_soc.c#L1865

the only call to mtk_rx_alloc is in mtk_dma_init which is called by mtk_start_dma, so it seems that is an initialisation issue and should happen only once at bootup.

https://github.com/frank-w/BPI-R2-4.14/blob/4.14-main/drivers/net/ethernet/mediatek/mtk_eth_soc.c#L1717

there was a network-problem imho between 4.14.92-98 but this is fixed. which is the last kernel-version you've tried?

have you any special network-setting? bridge (i see it...have you tried without it?), vlan, ... any way to reproduce the problem?

how about 4.19, here i have not added QDMA/BQL-Patches...only the second gmac alone (dsa-driver). you can revert it easier to check if it is the problem.

do you use hwnat/hwqos?

have you tried 5.0/5.1? you could try mainline 4.14 and swap mmc + put uart2 as first uart in mt7623.dtsi and use my config to check if problem comes up by any of my patches.

for filesystem i guess you have a hardware-error or your filesystem is broken...

frank-w avatar Apr 15 '19 21:04 frank-w

Hello,

Thank you for your answer,

After some test I detected that the network issue happen when avahi-daemon start. So when I disable it the network issue is solved. But I still think that it's not really normal that avahi-daemon break the kernel stability... Maybe can you try to reproduce this bug if you install avahi-daemon ?

there was a network-problem imho between 4.14.92-98 but this is fixed. which is the last kernel-version you've tried?

4.14.107 and 4.19.31.

have you any special network-setting? bridge (i see it...have you tried without it?), vlan, ... any way to reproduce the problem?

Yes use a bridge between all lan interfaces. I don't use a vlan.

how about 4.19, here i have not added QDMA/BQL-Patches...only the second gmac alone (dsa-driver). you can revert it easier to check if it is the problem.

Well about the 4.19 I only tested the version 4.19.29 and 4.19.31

do you use hwnat/hwqos?

I use the nat with iptables. As I understand with the 4.14 It use the hwnat.

have you tried 5.0/5.1?

No

you could try mainline 4.14 and swap mmc + put uart2 as first uart in mt7623.dtsi and use my config to check if problem comes up by any of my patches.

for filesystem i guess you have a hardware-error or your filesystem is broken...

Not really sure. Since my last test the issue look like independent. But it's also strange because the issue happen only when I boot on my main rootfs with a kernel >= 4.14.82. Note If I boot from on an other rootfs and after I mount the same partition with a kernel >= 4.14.82 I don't have this issue with the same files. For this I will continue to investigate on my side.

Josue-T avatar Apr 17 '19 07:04 Josue-T

please try 5.x and 4.19 before 2nd gmac is merged

try to figure out when the network-problem comes up (between which kernel-dot-releases) so we can look in git-commits which one may break it

currently i have much work and less time :(

frank-w avatar Apr 17 '19 17:04 frank-w

Do you have any build of the 5.x ?

Josue-T avatar Apr 17 '19 18:04 Josue-T

So, I tried this version here https://github.com/frank-w/BPI-R2-4.14/releases/download/CI-BUILD-20190412_101804-f78e2765e/bananapi-r2-image-4.19-main_4.19.33-1_armhf.deb and with this version it's ok.

try to figure out when the network-problem comes up (between which kernel-dot-releases) so we can look in git-commits which one may break it

I tried, but it's really difficult because at some boot I don't get the bug. So since my last test I would say that the version 4.14.67 (build : 20180829062337) is OK. The build 4.14.70 (build 20180917113512) is broken.

Josue-T avatar Apr 17 '19 20:04 Josue-T

if 4.19 have shown the problem and now it's gone with 4.19.33 it should also be fixed in 4.14.110 (normally same patches are applied per dot-release across main versions).

i have autobuild only activated for lts-kernels (4.14,4.19), if your issue is now fixed we don't need to test 5.0 now...if not i can upload one on gdrive

frank-w avatar Apr 17 '19 20:04 frank-w

Well,

As I understand the issue happen when some service try to do a syscall about network and the network is not already initialized.

By example while avahi-daemon try to start before the network "service" we have this bug. I also discovered this issue with monitorix which also start before the network. I we start theses services (manually) when the network is initialized it's ok.

So about the 4.19, I don't know because with my last test a saw that some part of the network initialization (mt7530 and bridge initialisation) is before all other (critical) service. In this case the bug don't happen.

Josue-T avatar Apr 18 '19 08:04 Josue-T

Hello,

I did some new tests and I saw that the network issue happens with all kernel version >=4.14.63. About the 4.19 I don't get this error at boot, but after the boot after a while I got also this error in the kernel log (on the version 4.19.41).

The only version which I never got this bug is the version 4.14.55 (that I got on your google drive).

Josue-T avatar May 16 '19 07:05 Josue-T

can you try to figure out which is the first kernel-version in 4.14 which have this error? it should be between 4.14.55 and 4.14.63 if i know the first we can look over the changes between this and the prior version to get the cause

have you tried with 4.19 without my second gmac-patches (and maybe without wifi because it modifies watchdog)?

frank-w avatar May 16 '19 07:05 frank-w

have you tried with 4.19 without my second gmac-patches (and maybe without wifi because it modifies watchdog)?

So which version is it ? And where is it available ?

Josue-T avatar May 16 '19 08:05 Josue-T

you have to revert changes and compile yourself

second gmac:

52365639697e (tag: CI-BUILD-20190109171035-52365639697e) net: dsa: changes to dts 0f5ff06adcb2 net: dsa: mt7530 add linking to mdio 943ed2bae585 net: dsa: tell GDMA when we are turning on the special tag 954b359077f7 net: dsa: dsa multi cpu (mt7530.c) a80c992c9372 net: dsa: add support for GMAC2 wired to ext 8325a7cbf964 net: dsa: adding handling of second CPU-Port 90adf3828368 net: dsa: add helper functions 329ff45aafea net: dsa: adding fields for holding information about upstream-port

wifi:

b3bf5911a5d9 [wifi] activated wifi-options 203a5a7727a8 [gcc] gcc8-fixes by Dominik Koch + nic_rx-patch from https://bugs.linaro.org/show_bug.cgi?id=3823 83ffbaceffed [wifi] adding wifi-related changes outside driver-directory ce8c582c6121 [wifi] adding driver-folder

have you found a way to reproduce it?

frank-w avatar May 16 '19 08:05 frank-w

have you found a way to reproduce it?

No, I didn't found a way. The problem is that it come randomly. Until yesterday I never got this issue with the 4.14.63 and yesterday I got the issue with all version >=4.14.63...

Probably it liked to more than one element. But don't know which element need to match to reproduce for now. It's a really old install which has been customized for a long time... If you have some idea about what could influence the stability I could check it.

With the 4.19 it's complicated to test because the issue happens not at the boot and come only a long time after the boot.

Maybe I might try to find which commit break the 4.14

Josue-T avatar May 16 '19 08:05 Josue-T

on 4.14 i have included 2nd gmac in 4.14.52, imho you need to revert hnat and hwqos too

b3ba09aea285 (origin/4.14-hnat, 4.14-hnat) [HNAT] applied full Patch from hnat-branch 966f5efa66c9 (origin/4.14-hwqos, 4.14-hwqos) added hw-qos a293bdb3d53a [gmac] added second gmac

i hope this are all wifi-commits in 4.14:

724c2e570f48 [BT] load bluetooth-driver in wifi.sh ee7516478daf [wifi] disabled some debug-messages c691e43c326c [dts] re-added wifi-dts-nodes 5450e748b60e [defconfig] re-added wifi-driver 7a28948f2bde adding wifi-start-script e46d5d07f474 updated Readme for internal wifi usage 03f472d620d0 merged wmt-tools from wifi-branch e6b369a5ade1 merge mt6625l wifi driver

frank-w avatar May 16 '19 08:05 frank-w

For 4.14 could you try to revert this patch:

https://github.com/openwrt/openwrt/blob/master/target/linux/mediatek/patches-4.14/0027-net-next-mediatek-fix-DQL-support.patch

frank-w avatar May 17 '19 13:05 frank-w

Any new information here? For network i reverted a patch which breaks network sometimes (dql)

frank-w avatar Oct 20 '19 07:10 frank-w

Any new information here? For network i reverted a patch which breaks network sometimes (dql)

Well sorry. Actually I'm running with the 4.19 version (precisely 4.19.72-bpi-r2-main). With this version it's just like a workaround the the bug...

I haven't invested more time to debug this because I've just too many things to do...

Josue-T avatar Oct 21 '19 18:10 Josue-T