libmctp icon indicating copy to clipboard operation
libmctp copied to clipboard

mctp-demux-daemon: No "rx complete" message received

Open eddiejames opened this issue 3 months ago • 5 comments

We're seeing an issue with latest libmctp where, when booting the BMC back up with the host running, we try a tx (write 1 to kcs reg 0) but we only get a corresponding "tx start" (read 1 from kcs reg 0), and after we reply with rx (write 2 to kcs reg 0) as usual, we never get any further wake ups for kcs data, so the transfer is stuck forever. If I revert to 2f1a1f66dae8752e386d55ab25f244d88d626e8c I no longer see the issue. Here's some tracing demonstrating the issue, which I annotated for ease of reviewing.

Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx complete msgpos:17 msglen:17
Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:1
Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc1
Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x2
Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx enabled
Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx skip msg:(nil) state:1

rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1
rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 186 bytes from 0x80080
rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0xcc1a9b5
rx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 178, type 1

tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 9
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 9 byte message from 8 to 9
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx dst 9 tag 0 payload len 9 seq 3. msg pos 0 len 9
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_binding_astlpc_tx: Transmitting 13-byte packet (8, 9, 0xf8)
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx disabled
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 4 bytes to 0x100
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_protect_v3: 0xe985ef1e
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 17 bytes to 0x104
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:1
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx complete msgpos:9 msglen:9

rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1
rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 94 bytes from 0x80080
rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0xd1fcab0f
rx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 86, type 1

tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x2
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx enabled
tx0 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx skip msg:(nil) state:1

tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 9
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 9 byte message from 8 to 9
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx dst 9 tag 0 payload len 9 seq 0. msg pos 0 len 9
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_binding_astlpc_tx: Transmitting 13-byte packet (8, 9, 0xc8)
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx disabled
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 4 bytes to 0x100
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_protect_v3: 0x9b653b16
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 17 bytes to 0x104
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:1
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx complete msgpos:9 msglen:9

rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1
rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 44 bytes from 0x80080
rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0x8aa8e62a
rx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 36, type 1

tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 17
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 17 byte message from 8 to 9
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx skip msg:0x1056d68 state:2

tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x2
tx1 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx enabled

tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx dst 9 tag 0 payload len 17 seq 1. msg pos 0 len 17
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_binding_astlpc_tx: Transmitting 21-byte packet (8, 9, 0xd8)
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx disabled
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 4 bytes to 0x100
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_protect_v3: 0x6ce4c1ea
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 25 bytes to 0x104
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:1
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx complete msgpos:17 msglen:17

tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x2
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx enabled
tx2 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx skip msg:(nil) state:1

rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1
rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 186 bytes from 0x80080
rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0x79a298b
rx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 178, type 1

tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 9
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 9 byte message from 8 to 9
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx dst 9 tag 0 payload len 9 seq 2. msg pos 0 len 9
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_binding_astlpc_tx: Transmitting 13-byte packet (8, 9, 0xe8)
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx disabled
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 4 bytes to 0x100
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_protect_v3: 0xe203ceb7
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 17 bytes to 0x104
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:1    **** tx
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx complete msgpos:9 msglen:9
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x2    **** rx complete
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx enabled
tx3 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx skip msg:(nil) state:1

rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1
rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 94 bytes from 0x80080
rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0x7fbd9655
rx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 86, type 1
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 44 bytes from 0x80080
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0xfb96cd7e
rx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 36, type 1

tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 9
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 9 byte message from 8 to 9
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx dst 9 tag 0 payload len 9 seq 3. msg pos 0 len 9
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_binding_astlpc_tx: Transmitting 13-byte packet (8, 9, 0xf8)
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: astlpc binding Tx disabled
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 4 bytes to 0x100
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_protect_v3: 0x3de762b2
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_write: 17 bytes to 0x104
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:1    **** final tx
tx4 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx complete msgpos:9 msglen:9

tx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 17
tx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 17 byte message from 8 to 9
tx5 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: tx skip msg:0x1056d68 state:2

rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: status: 0xc2
rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_poll: data: 0x1    **** final rx
rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 4 bytes from 0x8007c
rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_lpc_read: 186 bytes from 0x80080
rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: bmc: mctp_astlpc_kcs_send cmd:2
rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: astlpc: astlpc_pktbuf_validate_v3: 0x525ebb9d
rx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: MCTP message received: len 178, type 1

tx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: client[0] sent message: dest 0x09 len 9
tx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: mctp_message_tx_on_bus: Generating packets for transmission of 9 byte message from 8 to 9
tx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: core: Bus busy
tx6 Jul 29 15:39:40 p11bmc mctp-demux[928]: mctp-demux-daemon: Failed to send message: -16```

eddiejames avatar Aug 11 '25 18:08 eddiejames

@mkj did quite a bit of rework across the range from the commit you identify there. While some changes are more likely to be the source of trouble than others (4a09e1dc4883 ("core: Reuse buffers for tx, allow message pools"), perhaps), are you able to bisect to identify the start of the problem?

amboar avatar Aug 12 '25 00:08 amboar

We hit this in our p10bmc CI once or twice a day (so about a 5-10% hit rate I'd estimate) against our openbmc/openbmc gerrit commits. We've got built in retries so it doesn't hurt much but George and I do get to screen them each time.

geissonator avatar Aug 26 '25 12:08 geissonator

I think the problem here could be that now mctp_message_tx() will return -EBUSY if there is a tx_msg in progress (added in 4a09e1dc4883 that @amboar pointed to). https://github.com/openbmc/libmctp/blob/b04447c2d8c7d4659350a808ade9b1345794e3fc/core.c#L885-L889

mctp-demux-demon should probably omit incoming readfds in poll() if mctp_is_tx_ready() returns false. That API takes a destination EID though - we probably need a mctp_is_bus_tx_ready() equivalent too. I can take a look at a patch when I get a chance, unless you want to?

mkj avatar Sep 25 '25 04:09 mkj

I think the problem here could be that now mctp_message_tx() will return -EBUSY if there is a tx_msg in progress (added in 4a09e1d that @amboar pointed to).

libmctp/core.c

Lines 885 to 889 in b04447c

if (bus->tx_msg) { mctp_prdebug("Bus busy"); rc = -EBUSY; goto err; } mctp-demux-demon should probably omit incoming readfds in poll() if mctp_is_tx_ready() returns false. That API takes a destination EID though - we probably need a mctp_is_bus_tx_ready() equivalent too. I can take a look at a patch when I get a chance, unless you want to?

@mkj I can try, though I am unsure which pollfd you mean. Do you mean a client one or the binding one? I'm not sure how I would get to the bus pointer through the client one.

eddiejames avatar Sep 29 '25 21:09 eddiejames

@mkj I can try, though I am unsure which pollfd you mean. Do you mean a client one or the binding one? I'm not sure how I would get to the bus pointer through the client one.

I mean the client FDs https://github.com/openbmc/libmctp/blob/b04447c2d8c7d4659350a808ade9b1345794e3fc/utils/mctp-demux-daemon.c#L500-L501

mctp-demux-daemon only works with single busses (not bridge) so it can check ctx->binding->bus?

mkj avatar Sep 30 '25 06:09 mkj