linux
linux copied to clipboard
NCSI warning on 4.9-rc3-ish kernel
Seen on palm5 in the ADL. Running a hacked up 4.9-ish kernel for pinmux testing. Recording for posterity.
[ 171.760000] ------------[ cut here ]------------
[ 171.760000] WARNING: CPU: 0 PID: 124 at net/ncsi/ncsi-manage.c:240 ncsi_start_channel_monitor+0x48/0x88
[ 171.760000] Modules linked in:
[ 171.760000] CPU: 0 PID: 124 Comm: kworker/0:1 Not tainted 4.9.0-rc3-00265-g643e501af1d1-dirty #61
[ 171.760000] Hardware name: ASpeed SoC
[ 171.760000] Workqueue: events ncsi_dev_work
[ 171.760000] [<c01078a0>] (unwind_backtrace) from [<c01053ac>] (show_stack+0x10/0x14)
[ 171.760000] [<c01053ac>] (show_stack) from [<c010fdb0>] (__warn+0xc8/0xf8)
[ 171.760000] [<c010fdb0>] (__warn) from [<c010fed4>] (warn_slowpath_null+0x1c/0x24)
[ 171.760000] [<c010fed4>] (warn_slowpath_null) from [<c03cb03c>] (ncsi_start_channel_monitor+0x48/0x88)
[ 171.760000] [<c03cb03c>] (ncsi_start_channel_monitor) from [<c03cc15c>] (ncsi_configure_channel+0x280/0x2e0)
[ 171.760000] [<c03cc15c>] (ncsi_configure_channel) from [<c03cc9b0>] (ncsi_dev_work+0x39c/0x3e8)
[ 171.760000] [<c03cc9b0>] (ncsi_dev_work) from [<c0123ccc>] (process_one_work+0x238/0x400)
[ 171.760000] [<c0123ccc>] (process_one_work) from [<c0124b14>] (worker_thread+0x2b0/0x3e4)
[ 171.760000] [<c0124b14>] (worker_thread) from [<c0129640>] (kthread+0xcc/0xe8)
[ 171.760000] [<c0129640>] (kthread) from [<c01024b0>] (ret_from_fork+0x14/0x24)
[ 171.760000] ---[ end trace eee460182ba249d3 ]---
Note this was correlated with running a script to poke at all the gpios I could export on the system. No doubt I've upset some hardware that pinmux isn't preventing me from accessing due to earlier function/group selection.
Was there anything else in dmesg? "Wrong NCSI state" would have been interesting.
Can you repro by poking at GPIOA7, GPIOA6, GPIOR7 or GPIOR6?
Those are the MDIO pins for mac2 and mac1. If you can reproduce, does this fix the issue?
--- a/arch/arm/boot/dts/aspeed-ast2500-evb.dts
+++ b/arch/arm/boot/dts/aspeed-ast2500-evb.dts
@@ -47,14 +47,14 @@
status = "okay";
pinctrl-names = "default";
- pinctrl-0 = <&pinctrl_rgmii1_default>;
+ pinctrl-0 = <&pinctrl_rgmii1_default &pinctrl_mdio1_default>;
};
&mac1 {
status = "okay";
pinctrl-names = "default";
- pinctrl-0 = <&pinctrl_rgmii2_default>;
+ pinctrl-0 = <&pinctrl_rgmii2_default &pinctrl_mdio2_default>;
};
We should probably submit the patch regardless :)
Andrew, I'm not what the script was doing. It seems it affected the NCSI interfaces either by GPIO PINs used by NCSI interface or host power control. According to the backtrace, the monitor was disabled when it's enabled again. I need more info:
- kernel log.
- the script operating the GPIO PINs.
- Last NCSI patch (commit) included in your kernel (some patches merged to Linux-next, not arrived in 4.9.rc3 yet)
2016-11-02 23:05 GMT+11:00 Andrew Jeffery [email protected]:
Note this was correlated with running a script to poke at all the gpios I could export on the system. No doubt I've upset some hardware that pinmux isn't preventing me from accessing due to earlier function/group selection.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openbmc/linux/issues/109#issuecomment-257846041, or mute the thread https://github.com/notifications/unsubscribe-auth/AHN1hw9yYl4n6m2EJaGJKsml0aQGmUa8ks5q6Hx8gaJpZM4KnJl5 .
@gwshan I wouldn't get too concerned (yet). For the record, the script flipping the gpios is here: https://github.com/shenki/aspeed-kernel-tests/blob/master/set-gpios. It's clearly going to cause a bit of hardware state carnage; it's mainly written for testing on the AST2500 EVB but we don't have an equivalent for the AST2400.
With respect to NCSI, the most recent non-merge commit according to git log --grep ncsi
is c0cd1ba4f8bd8b5fef43bc51a2983673b8f086ff: net/ncsi: Introduce ncsi_stop_dev()