linux icon indicating copy to clipboard operation
linux copied to clipboard

mcp251xfd driver silently not sending data

Open mawildoer opened this issue 9 months ago • 3 comments

Describe the bug

Hey folks, appreciate your work!

I've configured an mcp2518fd IC on spi0 of a CM5, however attempting to send data via python-can or cansend doesn't lead a transmission, error (returned to the program) or message in dmesg. As far as I can tell, it's silently failing

Steps to reproduce the behaviour

  • Install Raspberry Pi OS, and config.txt (https://github.com/atopile/hil/blob/b37589c8e78b2563d3ebf7d195302f377f7bf866/examples/cellsim/config.txt)
  • Update firmware via sudo rpi-update
  • modprobe mcp251xfd && sudo ip link set can0 type can bitrate 500000 && sudo ip link set up can0
  • Confirm presence via ifconfig -a
  • Confirm no errors via dmesg
  • cansend can0 123#DEADBEEF
  • Confirm again no packets, no errors in dmesg

Device (s)

Raspberry Pi CM5

System

atopile@lively-sloth:~ $ cat /etc/rpi-issue
Raspberry Pi reference 2024-11-19
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 891df1e21ed2b6099a2e6a13e26c91dea44b34d4, stage2
atopile@lively-sloth:~ $ vcgencmd version
2025/03/19 13:41:26 
Copyright (c) 2012 Broadcom
version cec1d3ae (release) (embedded)
atopile@lively-sloth:~ $ uname -a
Linux lively-sloth 6.12.19-v8-16k+ #1865 SMP PREEMPT Wed Mar 19 13:48:20 GMT 2025 aarch64 GNU/Linux

Logs

After running all other commands: dmesg.log

Additional context

At one point, after being plugged in for a while I finally started getting messages via dmesg of CRC errors, but the time could've been unrelated and instead due to poor hardware connections/playing around with the bus while it was attempting to communicate

mawildoer avatar Mar 20 '25 00:03 mawildoer

I did discover a problem with RP1 DMA on 6.12 yesterday. It's worth trying with an updated kernel including the patch - run sudo rpi-update pulls/6729 to install it.

pelwell avatar Mar 20 '25 08:03 pelwell

I think there's something else going on here.

I've tried again with the mcp2515 (https://copperhilltech.com/dual-isolated-can-bus-hat-extended-version-for-raspberry-pi/) and have the same issue, but with an old kernel, other drivers etc...

Is my expectation to see packets going out via ifconfig -a correct? Is there a better mechanism to check this?

atopile@lively-sloth:~ $ uname -a
Linux lively-sloth 6.6.51+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux

dmesg.log

config.txt

atopile@lively-sloth:~ $ ifconfig -a
can0: flags=193<UP,RUNNING,NOARP>  mtu 16
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 10  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

can1: flags=128<NOARP>  mtu 16
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 10  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 2c:cf:67:d4:5e:c6  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 107  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 122  bytes 11382 (11.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 122  bytes 11382 (11.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tailscale0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1280
        inet 100.67.189.64  netmask 255.255.255.255  destination 100.67.189.64
        inet6 fd7a:115c:a1e0::6801:bd48  prefixlen 128  scopeid 0x0<global>
        inet6 fe80::a724:9a64:dd4f:cebb  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 5  bytes 588 (588.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 666 (666.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.189  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::1b81:b12d:50e3:f511  prefixlen 64  scopeid 0x20<link>
        ether 2c:cf:67:d4:5e:c7  txqueuelen 1000  (Ethernet)
        RX packets 12130  bytes 2456079 (2.3 MiB)
        RX errors 0  dropped 66  overruns 0  frame 0
        TX packets 9306  bytes 1648440 (1.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

mawildoer avatar Mar 20 '25 21:03 mawildoer

We've got something working with the MCP2515! 🎉

Turns out there was an electrical issue (no other nodes on the network), it it was attempting to send, however without an ACK coming back it wasn't registering the transmission via ifconfig -a, wasn't timing out in a time-scale I anticipated and didn't throw an error because as far as cansend and python-can were concerned - they'd sent the command to the queue!

Should we be expecting to see a timeout in dmesg eventually or anything other to indicate that this ack was missing? I believe the correct behaviour is also for the controller to end up in the "bus-off" state - at which point I'd expected sending additional messaged to trigger an exception, but never observed one

mawildoer avatar Mar 21 '25 22:03 mawildoer