mcp251xfd driver silently not sending data
Describe the bug
Hey folks, appreciate your work!
I've configured an mcp2518fd IC on spi0 of a CM5, however attempting to send data via python-can or cansend doesn't lead a transmission, error (returned to the program) or message in dmesg. As far as I can tell, it's silently failing
Steps to reproduce the behaviour
- Install Raspberry Pi OS, and
config.txt(https://github.com/atopile/hil/blob/b37589c8e78b2563d3ebf7d195302f377f7bf866/examples/cellsim/config.txt) - Update firmware via
sudo rpi-update -
modprobe mcp251xfd && sudo ip link set can0 type can bitrate 500000 && sudo ip link set up can0 - Confirm presence via
ifconfig -a - Confirm no errors via
dmesg -
cansend can0 123#DEADBEEF - Confirm again no packets, no errors in dmesg
Device (s)
Raspberry Pi CM5
System
atopile@lively-sloth:~ $ cat /etc/rpi-issue
Raspberry Pi reference 2024-11-19
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 891df1e21ed2b6099a2e6a13e26c91dea44b34d4, stage2
atopile@lively-sloth:~ $ vcgencmd version
2025/03/19 13:41:26
Copyright (c) 2012 Broadcom
version cec1d3ae (release) (embedded)
atopile@lively-sloth:~ $ uname -a
Linux lively-sloth 6.12.19-v8-16k+ #1865 SMP PREEMPT Wed Mar 19 13:48:20 GMT 2025 aarch64 GNU/Linux
Logs
After running all other commands: dmesg.log
Additional context
At one point, after being plugged in for a while I finally started getting messages via dmesg of CRC errors, but the time could've been unrelated and instead due to poor hardware connections/playing around with the bus while it was attempting to communicate
I did discover a problem with RP1 DMA on 6.12 yesterday. It's worth trying with an updated kernel including the patch - run sudo rpi-update pulls/6729 to install it.
I think there's something else going on here.
I've tried again with the mcp2515 (https://copperhilltech.com/dual-isolated-can-bus-hat-extended-version-for-raspberry-pi/) and have the same issue, but with an old kernel, other drivers etc...
Is my expectation to see packets going out via ifconfig -a correct? Is there a better mechanism to check this?
atopile@lively-sloth:~ $ uname -a
Linux lively-sloth 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux
atopile@lively-sloth:~ $ ifconfig -a
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 10 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
can1: flags=128<NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 10 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 2c:cf:67:d4:5e:c6 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 107
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 122 bytes 11382 (11.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 122 bytes 11382 (11.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tailscale0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1280
inet 100.67.189.64 netmask 255.255.255.255 destination 100.67.189.64
inet6 fd7a:115c:a1e0::6801:bd48 prefixlen 128 scopeid 0x0<global>
inet6 fe80::a724:9a64:dd4f:cebb prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC)
RX packets 5 bytes 588 (588.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 12 bytes 666 (666.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.189 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::1b81:b12d:50e3:f511 prefixlen 64 scopeid 0x20<link>
ether 2c:cf:67:d4:5e:c7 txqueuelen 1000 (Ethernet)
RX packets 12130 bytes 2456079 (2.3 MiB)
RX errors 0 dropped 66 overruns 0 frame 0
TX packets 9306 bytes 1648440 (1.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
We've got something working with the MCP2515! 🎉
Turns out there was an electrical issue (no other nodes on the network), it it was attempting to send, however without an ACK coming back it wasn't registering the transmission via ifconfig -a, wasn't timing out in a time-scale I anticipated and didn't throw an error because as far as cansend and python-can were concerned - they'd sent the command to the queue!
Should we be expecting to see a timeout in dmesg eventually or anything other to indicate that this ack was missing? I believe the correct behaviour is also for the controller to end up in the "bus-off" state - at which point I'd expected sending additional messaged to trigger an exception, but never observed one